You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Basab Maulik (JIRA)" <ji...@apache.org> on 2010/09/13 22:28:33 UTC

[jira] Created: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

Allow access to Primitive types stored in binary format in HBase
----------------------------------------------------------------

                 Key: HIVE-1634
                 URL: https://issues.apache.org/jira/browse/HIVE-1634
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: HBase Handler
    Affects Versions: 0.7.0
            Reporter: Basab Maulik
            Assignee: Basab Maulik


This addresses HIVE-1245 in part, for atomic or primitive types.

The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples.

There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification.

This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key.

Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below.

hive> create external table TestHiveHBaseExternalTable
    > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
    >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
    >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    >  with serdeproperties ("hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
    >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
OK
Time taken: 0.691 seconds
hive> select * from TestHiveHBaseExternalTable;
OK
key-1	NULL	NULL	NULL	NULL	NULL	Test-String	NULL	NULL
Time taken: 0.346 seconds
hive> drop table TestHiveHBaseExternalTable;
OK
Time taken: 0.139 seconds
hive> create external table TestHiveHBaseExternalTable
    > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
    >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
    >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    >  with serdeproperties (
    >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
    >  "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" )
    >  tblproperties (
    >  "hbase.table.name" = "TestHiveHBaseExternalTable",
    >  "hbase.table.default.storage.type" = "string");
OK
Time taken: 0.139 seconds
hive> select * from TestHiveHBaseExternalTable;
OK
key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
Time taken: 0.151 seconds
hive> drop table TestHiveHBaseExternalTable;
OK
Time taken: 0.154 seconds
hive> create external table TestHiveHBaseExternalTable
    > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
    >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
    >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    >  with serdeproperties (
    >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
    >  "hbase.columns.storage.types" = "-,b,b,b,b,b,-,b,b" )
    >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
OK
Time taken: 0.347 seconds
hive> select * from TestHiveHBaseExternalTable;
OK
key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
Time taken: 0.245 seconds
hive> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

Posted by "Basab Maulik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923776#action_12923776 ] 

Basab Maulik commented on HIVE-1634:
------------------------------------

Re: Beyond the review comments I added, I do have some higher-level suggestions:

    * For the column mapping, the reason I suggested "a:b:string" in the original JIRA description is that it's a pain to keep everything lined up by column position. It's already less than ideal that we do the column name mapping by position, so I don't think we should make it worse by having a separate property for type. Using the s/b shorthand is fine, and if you think that we shouldn't overload the colon, we can use a different separator, e.g. "cf:cq#s". Since the existing property name is hbase.columns.mapping, I don't think it will be confusing to roll in the (optional) type info as well.

I have adopted your suggestion of '#' as the separator to the storage information and use 'hbase.columns.mapping' to carry the additional storage information optionally. I have made a small change to allow any prefix of 'string' or of 'binary' to be valid, i.e. s/b or str/bin or string/binary etc.

    * I'm wondering whether we can just use the existing classes like LazyBinaryByte in package org.apache.hadoop.hive.serde2.lazybinary instead of creating new ones. Or are these not compatible with hbase.utils.Bytes?

I think the incompatibility stems more from trying to stay within the serde2.lazy.Lazy family of objects which the HBaseSerDe, LazyHBaseRow, and LazyHBaseCellMap extend or depend on. It will be useful to have these two families of classes compatible (inherit from a common base class). Small differences in the object inspector classes which type parametrize these classes further complicates getting past the type system. Should be doable but perhaps as a separate patch?

    * For the tests, I noticed that you have attached TestHiveHBaseExternalTable. I think it would be a good idea if you can create and populate such a fixture table in HBaseTestSetup; that way it can be available (treated as read-only) to all of the HBase .q tests. Otherwise, it's hard to verify that we're compatible with a table created directly through HBase API's rather than Hive.

Done. Added tests to create a Hive external table associated with this HBase table and test queries.

    * Also for the tests, it would be good if you can filter it down to only a small number of representative rows when pulling the initial test data set from the Hive src table. That way, we can keep the .q.out files smaller.

Done, the .out files are a lot smaller than in the initial patch.

    * Once we get this one committed, be sure to update the wiki.

Will do once this is committed.


> Allow access to Primitive types stored in binary format in HBase
> ----------------------------------------------------------------
>
>                 Key: HIVE-1634
>                 URL: https://issues.apache.org/jira/browse/HIVE-1634
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>    Affects Versions: 0.7.0
>            Reporter: Basab Maulik
>            Assignee: Basab Maulik
>         Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java
>
>
> This addresses HIVE-1245 in part, for atomic or primitive types.
> The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples.
> There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification.
> This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key.
> Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below.
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties ("hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.691 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	NULL	NULL	NULL	NULL	NULL	Test-String	NULL	NULL
> Time taken: 0.346 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.139 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" )
>     >  tblproperties (
>     >  "hbase.table.name" = "TestHiveHBaseExternalTable",
>     >  "hbase.table.default.storage.type" = "string");
> OK
> Time taken: 0.139 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.151 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.154 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,-,b,b" )
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.347 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.245 seconds
> hive> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910312#action_12910312 ] 

HBase Review Board commented on HIVE-1634:
------------------------------------------

Message from: "John Sichi" <js...@facebook.com>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/826/#review1247
-----------------------------------------------------------



trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
<http://review.cloudera.org/r/826/#comment4213>

    We keep adding new List data members.  Probably time to move to a single List<ColumnMapping>, with a new class ColumnMapping with fields for familyName, familyNameBytes, qualifierName, qualifierNameBytes, familyBinary, qualifierBinary.  That will be a lot cleaner and also allow you to avoid the boolean [] here, which is a little clumsy.



trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
<http://review.cloudera.org/r/826/#comment4210>

    Doesn't this error message need to change?



trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
<http://review.cloudera.org/r/826/#comment4214>

    I don't understand these TODO's.



trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java
<http://review.cloudera.org/r/826/#comment4215>

    Why is this assertion commented out?


- John





> Allow access to Primitive types stored in binary format in HBase
> ----------------------------------------------------------------
>
>                 Key: HIVE-1634
>                 URL: https://issues.apache.org/jira/browse/HIVE-1634
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>    Affects Versions: 0.7.0
>            Reporter: Basab Maulik
>            Assignee: Basab Maulik
>         Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java
>
>
> This addresses HIVE-1245 in part, for atomic or primitive types.
> The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples.
> There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification.
> This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key.
> Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below.
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties ("hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.691 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	NULL	NULL	NULL	NULL	NULL	Test-String	NULL	NULL
> Time taken: 0.346 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.139 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" )
>     >  tblproperties (
>     >  "hbase.table.name" = "TestHiveHBaseExternalTable",
>     >  "hbase.table.default.storage.type" = "string");
> OK
> Time taken: 0.139 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.151 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.154 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,-,b,b" )
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.347 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.245 seconds
> hive> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928356#action_12928356 ] 

John Sichi commented on HIVE-1634:
----------------------------------

But looking into it further, is it true that the only difference in persistence format is for Long and Integer (due to the zero-compression)?  Or are any of the other formats different as well?  If it's only these, then adding a whole new set of classes seems like a bad idea, and we should instead do any necessary refactoring now to allow the existing binary classes to be used (and add a couple of new ones for uncompressed int/long).

Considering the fact that we eventually want to be able to store map/struct/list as well (the rest of HIVE-1245), it's worth looking into the refactoring now, since the existing lazybinary covers those too (and we don't want to duplicate that).


> Allow access to Primitive types stored in binary format in HBase
> ----------------------------------------------------------------
>
>                 Key: HIVE-1634
>                 URL: https://issues.apache.org/jira/browse/HIVE-1634
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>    Affects Versions: 0.7.0
>            Reporter: Basab Maulik
>            Assignee: Basab Maulik
>         Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java
>
>
> This addresses HIVE-1245 in part, for atomic or primitive types.
> The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples.
> There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification.
> This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key.
> Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below.
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties ("hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.691 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	NULL	NULL	NULL	NULL	NULL	Test-String	NULL	NULL
> Time taken: 0.346 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.139 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" )
>     >  tblproperties (
>     >  "hbase.table.name" = "TestHiveHBaseExternalTable",
>     >  "hbase.table.default.storage.type" = "string");
> OK
> Time taken: 0.139 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.151 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.154 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,-,b,b" )
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.347 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.245 seconds
> hive> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925915#action_12925915 ] 

John Sichi commented on HIVE-1634:
----------------------------------

Thanks Basab, I'm going to try to take a look at this one next week.

> Allow access to Primitive types stored in binary format in HBase
> ----------------------------------------------------------------
>
>                 Key: HIVE-1634
>                 URL: https://issues.apache.org/jira/browse/HIVE-1634
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>    Affects Versions: 0.7.0
>            Reporter: Basab Maulik
>            Assignee: Basab Maulik
>         Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java
>
>
> This addresses HIVE-1245 in part, for atomic or primitive types.
> The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples.
> There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification.
> This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key.
> Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below.
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties ("hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.691 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	NULL	NULL	NULL	NULL	NULL	Test-String	NULL	NULL
> Time taken: 0.346 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.139 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" )
>     >  tblproperties (
>     >  "hbase.table.name" = "TestHiveHBaseExternalTable",
>     >  "hbase.table.default.storage.type" = "string");
> OK
> Time taken: 0.139 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.151 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.154 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,-,b,b" )
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.347 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.245 seconds
> hive> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923769#action_12923769 ] 

HBase Review Board commented on HIVE-1634:
------------------------------------------

Message from: bkm.hadoop@gmail.com


bq.  On 2010-09-16 13:28:48, John Sichi wrote:
bq.  > trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 499
bq.  > <http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line499>
bq.  >
bq.  >     Doesn't this error message need to change?

Updated the comment to "' should be mapped to Map<? extends LazyPrimitive<?, ?>,?>, that is " + "the Key for the map should be of primitive type, but is ... "


bq.  On 2010-09-16 13:28:48, John Sichi wrote:
bq.  > trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 623
bq.  > <http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line623>
bq.  >
bq.  >     I don't understand these TODO's.

Removed/updated comment.


bq.  On 2010-09-16 13:28:48, John Sichi wrote:
bq.  > trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 76
bq.  > <http://review.cloudera.org/r/826/diff/1/?file=11523#file11523line76>
bq.  >
bq.  >     We keep adding new List data members.  Probably time to move to a single List<ColumnMapping>, with a new class ColumnMapping with fields for familyName, familyNameBytes, qualifierName, qualifierNameBytes, familyBinary, qualifierBinary.  That will be a lot cleaner and also allow you to avoid the boolean [] here, which is a little clumsy.

I have changed the code to use List<ColumnMapping> with the fields of interest as members of this data class.


bq.  On 2010-09-16 13:28:48, John Sichi wrote:
bq.  > trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java, line 480
bq.  > <http://review.cloudera.org/r/826/diff/1/?file=11526#file11526line480>
bq.  >
bq.  >     Why is this assertion commented out?

I have removed this test. We do have coverage from the .q files for this case. This was failing due to small differences in the byte arrays from DataOutputStream/DataInputStream vs o.a.h.hbase.utils.Bytes.


- bkm


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/826/#review1247
-----------------------------------------------------------





> Allow access to Primitive types stored in binary format in HBase
> ----------------------------------------------------------------
>
>                 Key: HIVE-1634
>                 URL: https://issues.apache.org/jira/browse/HIVE-1634
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>    Affects Versions: 0.7.0
>            Reporter: Basab Maulik
>            Assignee: Basab Maulik
>         Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java
>
>
> This addresses HIVE-1245 in part, for atomic or primitive types.
> The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples.
> There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification.
> This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key.
> Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below.
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties ("hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.691 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	NULL	NULL	NULL	NULL	NULL	Test-String	NULL	NULL
> Time taken: 0.346 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.139 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" )
>     >  tblproperties (
>     >  "hbase.table.name" = "TestHiveHBaseExternalTable",
>     >  "hbase.table.default.storage.type" = "string");
> OK
> Time taken: 0.139 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.151 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.154 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,-,b,b" )
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.347 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.245 seconds
> hive> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928334#action_12928334 ] 

John Sichi commented on HIVE-1634:
----------------------------------

OK, I finally got some time to look into the Lazy* classes.  I see what you mean about the class hierarchy, and I agree that we can leave any refactoring of the existing classes for a followup patch.  Also, I was wrong to think that we could reuse the existing binary classes, since they do things such as VInt zero-compression, and that's incompatible with the HBase Bytes format.

However, for this patch, I want to at least get the new classes into their final destination with respect to package name and class name (so that we don't have to move them later, even if we adjust their inheritance).  To this end, I suggest a new package serde2.lazydio, and name the classes on the pattern LazyDioInteger.  The "Dio" is to indicate DataInput/DataOutput format.  (I was thinking of lazybytes and LazyByteInteger, to indicate HBase Bytes format, but then I saw that Byte is also one of the datatypes, and LazyBytesByte would be puzzling.)

Having both LazyIntegerBinary and LazyBinaryInteger, as in the current patch, would just be too confusing.

Also, regarding the implementation of the new classes, most of the init method code is duplicated from class to class.  The only thing specific to each class is the actual read+set.  Should we factor out a LazyDioObject (similar to the existing pattern for LazyObject and LazyBinaryObject)?  Likewise for LazyDioPrimitive and LazyDioNonPrimitive.

I will ask some others to chime in on this as well.


> Allow access to Primitive types stored in binary format in HBase
> ----------------------------------------------------------------
>
>                 Key: HIVE-1634
>                 URL: https://issues.apache.org/jira/browse/HIVE-1634
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>    Affects Versions: 0.7.0
>            Reporter: Basab Maulik
>            Assignee: Basab Maulik
>         Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java
>
>
> This addresses HIVE-1245 in part, for atomic or primitive types.
> The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples.
> There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification.
> This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key.
> Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below.
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties ("hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.691 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	NULL	NULL	NULL	NULL	NULL	Test-String	NULL	NULL
> Time taken: 0.346 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.139 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" )
>     >  tblproperties (
>     >  "hbase.table.name" = "TestHiveHBaseExternalTable",
>     >  "hbase.table.default.storage.type" = "string");
> OK
> Time taken: 0.139 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.151 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.154 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,-,b,b" )
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.347 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.245 seconds
> hive> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909003#action_12909003 ] 

HBase Review Board commented on HIVE-1634:
------------------------------------------

Message from: bkm.hadoop@gmail.com

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/826/
-----------------------------------------------------------

Review request for Hive Developers and John Sichi.


Summary
-------

This addresses HIVE-1245 in part, for atomic or primitive types.

The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples.

There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification.

This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key.


This addresses bug HIVE-1634.
    http://issues.apache.org/jira/browse/HIVE-1634


Diffs
-----

  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 990439 
  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 990439 
  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 990439 
  trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 990439 
  trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 990439 
  trunk/hbase-handler/src/test/queries/hbase_binary_map_queries.q PRE-CREATION 
  trunk/hbase-handler/src/test/queries/hbase_binary_storage_queries.q PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyBooleanBinary.java PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyByteBinary.java PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyDoubleBinary.java PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java 990439 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFloatBinary.java PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyIntegerBinary.java PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyLongBinary.java PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyShortBinary.java PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 990439 

Diff: http://review.cloudera.org/r/826/diff


Testing
-------

The HBase handler tests TestHBaseSerDe, TestLazyHBaseObject, TestHBaseCliDriver, and TestHBaseMinimrCliDriver pass.

New tests have been added to TestHBaseSerDe and TestLazyHBaseObject to test this feature.

New queries which exercise this feature have been added to query files hbase_binary_map_queries.q and hbase_binary_storage_queries.q.


Thanks,

bkm




> Allow access to Primitive types stored in binary format in HBase
> ----------------------------------------------------------------
>
>                 Key: HIVE-1634
>                 URL: https://issues.apache.org/jira/browse/HIVE-1634
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>    Affects Versions: 0.7.0
>            Reporter: Basab Maulik
>            Assignee: Basab Maulik
>         Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java
>
>
> This addresses HIVE-1245 in part, for atomic or primitive types.
> The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples.
> There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification.
> This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key.
> Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below.
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties ("hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.691 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	NULL	NULL	NULL	NULL	NULL	Test-String	NULL	NULL
> Time taken: 0.346 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.139 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" )
>     >  tblproperties (
>     >  "hbase.table.name" = "TestHiveHBaseExternalTable",
>     >  "hbase.table.default.storage.type" = "string");
> OK
> Time taken: 0.139 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.151 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.154 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,-,b,b" )
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.347 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.245 seconds
> hive> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923746#action_12923746 ] 

HBase Review Board commented on HIVE-1634:
------------------------------------------

Message from: bkm.hadoop@gmail.com

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/826/
-----------------------------------------------------------

(Updated 2010-10-21 20:11:06.837430)


Review request for Hive Developers and John Sichi.


Changes
-------

The proposed serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" as a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping" has been removed. Instead the storage option is an optional part of the "hbase.columns.mapping" and is specified for a column using '#' as a separator following the column family/qualifier. Allowed values are '' for table default, a prefix of 'string' for standard string storage, and a prefix of 'binary' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 'str:bin' or 's:b' for the key and value part specifiers respectively.

The tests TestHBaseSerDe, TestLazyHBaseObject, TestHBaseCliDriver, and TestHBaseMinimrCliDriver pass.


Summary
-------

This addresses HIVE-1245 in part, for atomic or primitive types.

The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples.

There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification.

This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key.


This addresses bug HIVE-1634.
    http://issues.apache.org/jira/browse/HIVE-1634


Diffs (updated)
-----

  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 1023967 
  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java 1023967 
  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java 1023967 
  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 1023967 
  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java 1023967 
  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 1023967 
  trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 1023967 
  trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java 1023967 
  trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 1023967 
  trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 1023967 
  trunk/hbase-handler/src/test/queries/hbase_binary_external_table_queries.q PRE-CREATION 
  trunk/hbase-handler/src/test/queries/hbase_binary_map_queries.q PRE-CREATION 
  trunk/hbase-handler/src/test/queries/hbase_binary_storage_queries.q PRE-CREATION 
  trunk/hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out PRE-CREATION 
  trunk/hbase-handler/src/test/results/hbase_binary_map_queries.q.out PRE-CREATION 
  trunk/hbase-handler/src/test/results/hbase_binary_storage_queries.q.out PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyBooleanBinary.java PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyByteBinary.java PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyDoubleBinary.java PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java 1023967 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFloatBinary.java PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyIntegerBinary.java PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyLongBinary.java PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyShortBinary.java PRE-CREATION 
  trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 1023967 

Diff: http://review.cloudera.org/r/826/diff


Testing
-------

The HBase handler tests TestHBaseSerDe, TestLazyHBaseObject, TestHBaseCliDriver, and TestHBaseMinimrCliDriver pass.

New tests have been added to TestHBaseSerDe and TestLazyHBaseObject to test this feature.

New queries which exercise this feature have been added to query files hbase_binary_map_queries.q and hbase_binary_storage_queries.q.


Thanks,

bkm




> Allow access to Primitive types stored in binary format in HBase
> ----------------------------------------------------------------
>
>                 Key: HIVE-1634
>                 URL: https://issues.apache.org/jira/browse/HIVE-1634
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>    Affects Versions: 0.7.0
>            Reporter: Basab Maulik
>            Assignee: Basab Maulik
>         Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java
>
>
> This addresses HIVE-1245 in part, for atomic or primitive types.
> The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples.
> There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification.
> This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key.
> Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below.
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties ("hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.691 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	NULL	NULL	NULL	NULL	NULL	Test-String	NULL	NULL
> Time taken: 0.346 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.139 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" )
>     >  tblproperties (
>     >  "hbase.table.name" = "TestHiveHBaseExternalTable",
>     >  "hbase.table.default.storage.type" = "string");
> OK
> Time taken: 0.139 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.151 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.154 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,-,b,b" )
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.347 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.245 seconds
> hive> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910305#action_12910305 ] 

John Sichi commented on HIVE-1634:
----------------------------------

Hey Basab,

This is a great start.  Beyond the review comments I added, I do have some higher-level suggestions:

* For the column mapping, the reason I suggested "a:b:string" in the original JIRA description is that it's a pain to keep everything lined up by column position.  It's already less than ideal that we do the column name mapping by position, so I don't think we should make it worse by having a separate property for type.  Using the s/b shorthand is fine, and if you think that we shouldn't overload the colon, we can use a different separator, e.g. "cf:cq#s".  Since the existing property name is hbase.columns.mapping, I don't think it will be confusing to roll in the (optional) type info as well.

* I'm wondering whether we can just use the existing classes like LazyBinaryByte in package org.apache.hadoop.hive.serde2.lazybinary instead of creating new ones.  Or are these not compatible with hbase.utils.Bytes?

* For the tests, I noticed that you have attached TestHiveHBaseExternalTable.  I think it would be a good idea if you can create and populate such a fixture table in HBaseTestSetup; that way it can be available (treated as read-only) to all of the HBase .q tests.  Otherwise, it's hard to verify that we're compatible with a table created directly through HBase API's rather than Hive.

* Also for the tests, it would be good if you can filter it down to only a small number of representative rows when pulling the initial test data set from the Hive src table.  That way, we can keep the .q.out files smaller.

* Once we get this one committed, be sure to update the wiki.


> Allow access to Primitive types stored in binary format in HBase
> ----------------------------------------------------------------
>
>                 Key: HIVE-1634
>                 URL: https://issues.apache.org/jira/browse/HIVE-1634
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>    Affects Versions: 0.7.0
>            Reporter: Basab Maulik
>            Assignee: Basab Maulik
>         Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java
>
>
> This addresses HIVE-1245 in part, for atomic or primitive types.
> The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples.
> There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification.
> This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key.
> Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below.
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties ("hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.691 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	NULL	NULL	NULL	NULL	NULL	Test-String	NULL	NULL
> Time taken: 0.346 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.139 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" )
>     >  tblproperties (
>     >  "hbase.table.name" = "TestHiveHBaseExternalTable",
>     >  "hbase.table.default.storage.type" = "string");
> OK
> Time taken: 0.139 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.151 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.154 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,-,b,b" )
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.347 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.245 seconds
> hive> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

Posted by "Basab Maulik (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Basab Maulik updated HIVE-1634:
-------------------------------

    Attachment: TestHiveHBaseExternalTable.java
                HIVE-1634.0.patch

Attached is a preliminary patch for this issue.

A scan of the HBase table for the example above:

hbase(main):004:0> scan 'TestHiveHBaseExternalTable'
ROW                          COLUMN+CELL                                                                      
 key-1                       column=cf:boolean, timestamp=1284406847770, value=\xFF                           
 key-1                       column=cf:byte, timestamp=1284406847770, value=\x80                              
 key-1                       column=cf:double, timestamp=1284406847770, value=|i\xD3lwy\xDCb                  
 key-1                       column=cf:float, timestamp=1284406847770, value=\xAD\xBF\xB1\xC5                 
 key-1                       column=cf:int, timestamp=1284406847770, value=\x80\x00\x00\x00                   
 key-1                       column=cf:long, timestamp=1284406847770, value=\x80\x00\x00\x00\x00\x00\x00\x00  
 key-1                       column=cf:short, timestamp=1284406847770, value=\x80\x00                         
 key-1                       column=cf:string, timestamp=1284406847770, value=Test-String                     
1 row(s) in 0.3670 seconds


> Allow access to Primitive types stored in binary format in HBase
> ----------------------------------------------------------------
>
>                 Key: HIVE-1634
>                 URL: https://issues.apache.org/jira/browse/HIVE-1634
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>    Affects Versions: 0.7.0
>            Reporter: Basab Maulik
>            Assignee: Basab Maulik
>         Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java
>
>
> This addresses HIVE-1245 in part, for atomic or primitive types.
> The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples.
> There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification.
> This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key.
> Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below.
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties ("hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.691 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	NULL	NULL	NULL	NULL	NULL	Test-String	NULL	NULL
> Time taken: 0.346 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.139 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" )
>     >  tblproperties (
>     >  "hbase.table.name" = "TestHiveHBaseExternalTable",
>     >  "hbase.table.default.storage.type" = "string");
> OK
> Time taken: 0.139 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.151 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.154 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,-,b,b" )
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.347 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.245 seconds
> hive> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.