You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Rupinder Singh <rs...@care.com> on 2013/03/19 08:51:51 UTC

Select queries on hbase table with composite key return wrong number of rows

I have an hbase table created as follows:
create 'event', {NAME => 'm', VERSIONS => 1}, {NAME => 'e', VERSIONS => 1}

I have a hive table mapped to the hbase table defined as follows:
CREATE EXTERNAL TABLE h_event(key struct<name:string,dateCreated:string,userId:string>, dummy string, dummy2 string)
ROW FORMAT DELIMITED
COLLECTION ITEMS TERMINATED BY '~'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,m:dum,e:dum2")
TBLPROPERTIES ("hbase.table.name" = "event");

(You may notice that this is as per the workaround suggested in https://issues.apache.org/jira/browse/HIVE-2599 by Swarnim Kulkarni)

I am loading data in the table from a text file by doing an 'INSERT OVERWRITE TABLE h_event ...' statement.
Data is loaded successfully without errors and a count 'event' done in hbase shell returns the correct number of expected records.

Environment:
I am on hive 0.8.1 and hbase 0.92.0, both running on Debian/Squeeze.

Problem:
"select count(*) from h_event where key.name='abc' " returns incorrect count and is always less than the actual number of records having key.name='abc' in the table.
"select count(*) , key.name from h_event group by key.name" also returns the wrong counts for various values of key.name and the subtotals returned by this query do not add up to the total number of records in the table.

"select * from h_event where key.name='abc' " returns the expected number of rows with the right keys and data.
"select count(*) from h_event" returns the correct number of rows in the table
"select count(key) from h_event" returns the wrong number, lesser than the actual rows

"select dummy from h_event where key.name='abc' " returns correct rows with right data
"select key.userId from h_event where key.name='abc' " returns wrong(lesser) number of rows

It seems to me that composite keys just don't seem to work in the expected manner; if you have any of the key columns in the select clause, the returned results are incomplete. Am I missing something obvious here, or is the only solution to go back to duplicating key columns again in the table ?

Any help is greatly appreciated.

Thanks
Rupinder



This email is intended for the person(s) to whom it is addressed and may contain information that is PRIVILEGED or CONFIDENTIAL. Any unauthorized use, distribution, copying, or disclosure by any person other than the addressee(s) is strictly prohibited. If you have received this email in error, please notify the sender immediately by return email and delete the message and any attachments from your system.