You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Daniel Pruckler <Da...@t8webware.com> on 2010/03/15 21:44:24 UTC
Problem with HBaseIntegration. Integers show up as NULL in hive
I create a table in hbase
create 'hbase_table', 'colfam'
Then run the following code to insert a row into the table:
public class Test
{
public static void main(String[] args) throws Exception
{
String key = "key";
String stringValue = "string";
int intValue = 1;
double doubleValue = 1.5;
boolean booleanValue = false;
Configuration conf = new Configuration();
HTable table = new HTable(new HBaseConfiguration(conf), "hbase_table");
Put put = new Put(Bytes.toBytes(key));
put.add(Bytes.toBytes("colfam"), Bytes.toBytes("string_value"), Bytes.toBytes(stringValue));
put.add(Bytes.toBytes("colfam"), Bytes.toBytes("int_value"), Bytes.toBytes(intValue));
put.add(Bytes.toBytes("colfam"), Bytes.toBytes("int_value_as_string"), Bytes.toBytes(Integer.toString(intValue)));
put.add(Bytes.toBytes("colfam"), Bytes.toBytes("double_value"), Bytes.toBytes(doubleValue));
put.add(Bytes.toBytes("colfam"), Bytes.toBytes("boolean_value"), Bytes.toBytes(booleanValue));
table.put(put);
}
}
scan 'hbase_table'
ROW COLUMN+CELL
key column=colfam:boolean_value, timestamp=1268684020790, value=\x00
key column=colfam:double_value, timestamp=1268684020790, value=\x3F\xF8\x00\x00\x00\x
00\x00\x00
key column=colfam:int_value, timestamp=1268684020790, value=\x00\x00\x00\x01
key column=colfam:string_value, timestamp=1268684020790, value=string
Then in hive I run
CREATE EXTERNAL TABLE hive_table(key string, string_value string, int_value int, int_value_as_string int, double_value double, boolean_value boolean)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = "colfam:string_value,colfam:int_value,colfam:int_value_as_string,colfam:double_value,colfam:boolean_value",
"hbase.table.name" = "hbase_table"
);
Then when selecting back I get null values
select * from hive_table;
OK
key string NULL 1 NULL NULL
The tables are all set to the type I expect:
describe hive_table;
OK
key string from deserializer
string_value string from deserializer
int_value int from deserializer
int_value_as_string int from deserializer
double_value double from deserializer
boolean_value boolean from deserializer
Only the strings and integer that was written to hbase as a string seem to be showing up in hive. I'm not seeing any errors in the hive logs. Do all values have to be stored as a string in hbase to work with HBaseIntegration?
Thanks
Re: Problem with HBaseIntegration. Integers show up as NULL in hive
Posted by John Sichi <js...@facebook.com>.
Thanks for the bug report.
So far, most of the testing has been round-tripping data from Hive into HBase and back (rather than accessing existing HBase tables). I'll log a followup task for mapping in data created in other formats, with suggestions welcome on how the type-mapping specification should look.
JVS
On Mar 15, 2010, at 1:44 PM, Daniel Pruckler wrote:
I create a table in hbase
create 'hbase_table', 'colfam'
Then run the following code to insert a row into the table:
public class Test
{
public static void main(String[] args) throws Exception
{
String key = "key";
String stringValue = "string";
int intValue = 1;
double doubleValue = 1.5;
boolean booleanValue = false;
Configuration conf = new Configuration();
HTable table = new HTable(new HBaseConfiguration(conf), "hbase_table");
Put put = new Put(Bytes.toBytes(key));
put.add(Bytes.toBytes("colfam"), Bytes.toBytes("string_value"), Bytes.toBytes(stringValue));
put.add(Bytes.toBytes("colfam"), Bytes.toBytes("int_value"), Bytes.toBytes(intValue));
put.add(Bytes.toBytes("colfam"), Bytes.toBytes("int_value_as_string"), Bytes.toBytes(Integer.toString(intValue)));
put.add(Bytes.toBytes("colfam"), Bytes.toBytes("double_value"), Bytes.toBytes(doubleValue));
put.add(Bytes.toBytes("colfam"), Bytes.toBytes("boolean_value"), Bytes.toBytes(booleanValue));
table.put(put);
}
}
scan 'hbase_table'
ROW COLUMN+CELL
key column=colfam:boolean_value, timestamp=1268684020790, value=\x00
key column=colfam:double_value, timestamp=1268684020790, value=\x3F\xF8\x00\x00\x00\x
00\x00\x00
key column=colfam:int_value, timestamp=1268684020790, value=\x00\x00\x00\x01
key column=colfam:string_value, timestamp=1268684020790, value=string
Then in hive I run
CREATE EXTERNAL TABLE hive_table(key string, string_value string, int_value int, int_value_as_string int, double_value double, boolean_value boolean)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = "colfam:string_value,colfam:int_value,colfam:int_value_as_string,colfam:double_value,colfam:boolean_value",
"hbase.table.name<http://hbase.table.name>" = "hbase_table"
);
Then when selecting back I get null values
select * from hive_table;
OK
key string NULL 1 NULL NULL
The tables are all set to the type I expect:
describe hive_table;
OK
key string from deserializer
string_value string from deserializer
int_value int from deserializer
int_value_as_string int from deserializer
double_value double from deserializer
boolean_value boolean from deserializer
Only the strings and integer that was written to hbase as a string seem to be showing up in hive. I'm not seeing any errors in the hive logs. Do all values have to be stored as a string in hbase to work with HBaseIntegration?
Thanks