You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "david (JIRA)" <ji...@apache.org> on 2016/01/12 07:11:40 UTC
[jira] [Updated] (HIVE-12844) hive-1.2.1 doesn't return correct value when run select count query

     [ https://issues.apache.org/jira/browse/HIVE-12844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

david updated HIVE-12844:
-------------------------
    Description: 
in hbase 1.0.2,I created a table 'test1',it has below rows and values:
hbase(main):027:0> scan 'test1'
ROW                                               COLUMN+CELL                                                                                                                                    
 a1                                               column=df1:a2, timestamp=1452505991743, value=ddd                                                                                              
 a1                                               column=df1:a3, timestamp=1452506082723, value=eee                                                                                              
 a1                                               column=df1:c2, timestamp=1452505705391, value=bbb                                                                                              
 b1                                               column=df1:a2, timestamp=1452505838737, value=ccc                                                                                              
 b1                                               column=df1:a3, timestamp=1452506149461, value=fff                                                                                              
 r1                                               column=df1:a, timestamp=1452507261849, value=hhh                                                                                               
 r1                                               column=df1:a1, timestamp=1452507100774, value=ggg                                                                                              
 r1                                               column=df1:c1, timestamp=1451221711588, value=aaa

then I created hive-1.2.1 table:
create external table test3(
          key string,
          coll string,
          col2 string,
          col3 string,
          col4 string,
          col5 string,
          col6 string)
          STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
          WITH SERDEPROPERTIES
          ("hbase.columns.mapping" =
          ":key,df1:a,df1:1,df1:a2,df1:a3,df1:c1,df1:c2")
          TBLPROPERTIES("hbase.table.name" = "test1"); 

when I run query in hive:
hive> select * from test3;
OK
a1      NULL    NULL    ddd     eee     NULL    bbb
b1      NULL    NULL    ccc      fff        NULL    NULL
r1      hhh        NULL    NULL  NULL  aaa     NULL
the result is correct,but when I run:
select count(1) from test3;
Total MapReduce CPU Time Spent: 6 seconds 770 msec
OK
1
it returns "1",I find that it doesn't count the rows where the first column is null,
Could you help to analyze this?
by the way the hadoop version is 2.6.0

  was:
in hbase 1.0.2,I created a table 'test1',it has below rows and values:
hbase(main):027:0> scan 'test1'
ROW                                               COLUMN+CELL                                                                                                                                    
 a1                                               column=df1:a2, timestamp=1452505991743, value=ddd                                                                                              
 a1                                               column=df1:a3, timestamp=1452506082723, value=eee                                                                                              
 a1                                               column=df1:c2, timestamp=1452505705391, value=bbb                                                                                              
 b1                                               column=df1:a2, timestamp=1452505838737, value=ccc                                                                                              
 b1                                               column=df1:a3, timestamp=1452506149461, value=fff                                                                                              
 r1                                               column=df1:a, timestamp=1452507261849, value=hhh                                                                                               
 r1                                               column=df1:a1, timestamp=1452507100774, value=ggg                                                                                              
 r1                                               column=df1:c1, timestamp=1451221711588, value=aaa

then I created hive-1.2.1 table:
create external table test3(
          key string,
          coll string,
          col2 string,
          col3 string,
          col4 string,
          col5 string,
          col6 string)
          STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
          WITH SERDEPROPERTIES
          ("hbase.columns.mapping" =
          ":key,df1:a,df1:1,df1:a2,df1:a3,df1:c1,df1:c2")
          TBLPROPERTIES("hbase.table.name" = "test1"); 

when I run query in hive:
hive> select * from test3;
OK
a1      NULL    NULL    ddd     eee     NULL    bbb
b1      NULL    NULL    ccc     fff     NULL    NULL
r1      hhh     NULL    NULL    NULL    aaa     NULL
the result is correct,but when I run:
select count(1) from test3;
Total MapReduce CPU Time Spent: 6 seconds 770 msec
OK
1
it returns "1",I find that it doesn't count the rows where the first column is null,
Could you help to analyze this?
by the way the hadoop version is 2.6.0


> hive-1.2.1 doesn't return correct value when run select count query
> -------------------------------------------------------------------
>
>                 Key: HIVE-12844
>                 URL: https://issues.apache.org/jira/browse/HIVE-12844
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.2.1
>            Reporter: david
>            Priority: Critical
>
> in hbase 1.0.2,I created a table 'test1',it has below rows and values:
> hbase(main):027:0> scan 'test1'
> ROW                                               COLUMN+CELL                                                                                                                                    
>  a1                                               column=df1:a2, timestamp=1452505991743, value=ddd                                                                                              
>  a1                                               column=df1:a3, timestamp=1452506082723, value=eee                                                                                              
>  a1                                               column=df1:c2, timestamp=1452505705391, value=bbb                                                                                              
>  b1                                               column=df1:a2, timestamp=1452505838737, value=ccc                                                                                              
>  b1                                               column=df1:a3, timestamp=1452506149461, value=fff                                                                                              
>  r1                                               column=df1:a, timestamp=1452507261849, value=hhh                                                                                               
>  r1                                               column=df1:a1, timestamp=1452507100774, value=ggg                                                                                              
>  r1                                               column=df1:c1, timestamp=1451221711588, value=aaa
> then I created hive-1.2.1 table:
> create external table test3(
>           key string,
>           coll string,
>           col2 string,
>           col3 string,
>           col4 string,
>           col5 string,
>           col6 string)
>           STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>           WITH SERDEPROPERTIES
>           ("hbase.columns.mapping" =
>           ":key,df1:a,df1:1,df1:a2,df1:a3,df1:c1,df1:c2")
>           TBLPROPERTIES("hbase.table.name" = "test1"); 
> when I run query in hive:
> hive> select * from test3;
> OK
> a1      NULL    NULL    ddd     eee     NULL    bbb
> b1      NULL    NULL    ccc      fff        NULL    NULL
> r1      hhh        NULL    NULL  NULL  aaa     NULL
> the result is correct,but when I run:
> select count(1) from test3;
> Total MapReduce CPU Time Spent: 6 seconds 770 msec
> OK
> 1
> it returns "1",I find that it doesn't count the rows where the first column is null,
> Could you help to analyze this?
> by the way the hadoop version is 2.6.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)