You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Marcel Steinbach <ms...@icloud.com> on 2017/09/19 08:52:09 UTC

HBase column mappings with whitespace suffix/prefix

We have HBase tables where column qualifiers have whitespace suffixes. The reason for that was to use short qualifiers, ideally single byte; and counting started with \u0001.
 
Now I need to hook the HBase table into Hive, so I define a column mapping, e.g.
 
CREATE EXTERNAL TABLE abc (key string, column string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ('hbase.columns.mapping' = ":key,A:\u0001")
TBLPROPERTIES ('hbase.table.name' = 'hbase_abc');
 
The problem is, with 'hbase.columns.mapping' = ":key,A:\u0001", the second column, A:\u0001, ends with a whitespace (< \u0020) and because of [1] and [2], it gets trimmed by String.trim() ([3]).
Even the HBase documentation is wrong ([4]):
 
    "whitespace should not be used in between entries since these will be interperted as part of the column name, which is almost certainly not what you want"
 
The reason for HIVE-3243 was for being "less confusing". However, one could argue that it added an implicit "auto-correct" on top of the syntax of the column mappings, which is even worse, as it "trimmed" down what you can use as HBase column qualifiers. 
 
I see the issue of backwards-compatibility and if we change it, it will change the current behaviour for people relying on the whitspace-trimming.

What are your opinions? 

Regards,
Marcel
 
[1] https://github.com/apache/hive/blob/32e854ef1c25f21d53f7932723cfc76bf75a71cd/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java#L178
[2] https://issues.apache.org/jira/browse/HIVE-3243
[3] https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#trim()
[4] https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-ColumnMapping