You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2013/01/09 11:25:56 UTC

[jira] [Commented] (HIVE-1634) Allow access to Primitive types stored in binary format in HBase

    [ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13547939#comment-13547939 ] 

Hudson commented on HIVE-1634:
------------------------------

Integrated in Hive-trunk-hadoop2 #54 (See [https://builds.apache.org/job/Hive-trunk-hadoop2/54/])
    HIVE-2958 [jira] GROUP BY causing ClassCastException [LazyDioInteger cannot be
cast LazyInteger]
(Navis Ryu via Ashutosh Chauhan)

Summary:
DPAL-1111 GROUP BY causing ClassCastException [LazyDioInteger cannot be cast
LazyInteger]

This relates to https://issues.apache.org/jira/browse/HIVE-1634.

The following work fine:

CREATE EXTERNAL TABLE tim_hbase_occurrence (
  id int,
  scientific_name string,
  data_resource_id int
) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH
SERDEPROPERTIES (
  "hbase.columns.mapping" = ":key#b,v:scientific_name#s,v:data_resource_id#b"
) TBLPROPERTIES(
  "hbase.table.name" = "mini_occurrences",
  "hbase.table.default.storage.type" = "binary"
);
SELECT * FROM tim_hbase_occurrence LIMIT 3;
SELECT * FROM tim_hbase_occurrence WHERE data_resource_id=1081 LIMIT 3;

However, the following fails:

SELECT data_resource_id, count(*) FROM tim_hbase_occurrence GROUP BY
data_resource_id;

The error given:

0 TS
2012-04-17 16:58:45,693 INFO org.apache.hadoop.hive.ql.exec.MapOperator:
Initialization Done 7 MAP
2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator:
Processing alias tim_hbase_occurrence for file
hdfs://c1n2.gbif.org/user/hive/warehouse/tim_hbase_occurrence
2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7
forwarding 1 rows
2012-04-17 16:58:45,714 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0
forwarding 1 rows
2012-04-17 16:58:45,716 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1
forwarding 1 rows
2012-04-17 16:58:45,723 FATAL ExecMapper:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row {"id":1444,"scientific_name":null,"data_resource_id":1081}
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:548)
	at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
	at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ClassCastException:
org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to
org.apache.hadoop.hive.serde2.lazy.LazyInteger
	at
org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:737)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
	at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
	at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:529)
	... 9 more
Caused by: java.lang.ClassCastException:
org.apache.hadoop.hive.serde2.lazydio.LazyDioInteger cannot be cast to
org.apache.hadoop.hive.serde2.lazy.LazyInteger
	at
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyIntObjectInspector.copyObject(LazyIntObjectInspector.java:43)
	at
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:239)
	at
org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
	at
org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
	at
org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
	at
org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:750)
	at
org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:722)
	... 18 more

Test Plan: EMPTY

Reviewers: JIRA, ashutoshc

Reviewed By: ashutoshc

Differential Revision: https://reviews.facebook.net/D2871 (Revision 1328157)
HIVE-1634: Allow access to Primitive types stored in binary format in HBase (Basab Maulik, Ashutosh Chauhan via hashutosh) (Revision 1298673)

     Result = ABORTED
hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1328157
Files : 
* /hive/trunk/hbase-handler/src/test/queries/hbase_binary_external_table_queries.q
* /hive/trunk/hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioBoolean.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioByte.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioDouble.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioFloat.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioInteger.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioLong.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioShort.java

hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298673
Files : 
* /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
* /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsAggregator.java
* /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStatsPublisher.java
* /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
* /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
* /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableOutputFormat.java
* /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java
* /hive/trunk/hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java
* /hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestSetup.java
* /hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java
* /hive/trunk/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java
* /hive/trunk/hbase-handler/src/test/queries/hbase_binary_external_table_queries.q
* /hive/trunk/hbase-handler/src/test/queries/hbase_binary_map_queries.q
* /hive/trunk/hbase-handler/src/test/queries/hbase_binary_storage_queries.q
* /hive/trunk/hbase-handler/src/test/results/hbase_binary_external_table_queries.q.out
* /hive/trunk/hbase-handler/src/test/results/hbase_binary_map_queries.q.out
* /hive/trunk/hbase-handler/src/test/results/hbase_binary_storage_queries.q.out
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyObject.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyPrimitive.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioBoolean.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioByte.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioDouble.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioFloat.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioInteger.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioLong.java
* /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/lazydio/LazyDioShort.java

                
> Allow access to Primitive types stored in binary format in HBase
> ----------------------------------------------------------------
>
>                 Key: HIVE-1634
>                 URL: https://issues.apache.org/jira/browse/HIVE-1634
>             Project: Hive
>          Issue Type: New Feature
>          Components: HBase Handler
>    Affects Versions: 0.7.0, 0.8.0, 0.9.0
>            Reporter: Basab Maulik
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.9.0
>
>         Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.3.patch, HIVE-1634.0.patch, HIVE-1634.1.patch, hive-1634_3.patch, HIVE-1634.branch08.patch, TestHiveHBaseExternalTable.java
>
>
> This addresses HIVE-1245 in part, for atomic or primitive types.
> The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification of the storage option for the corresponding column in the serde property "hbase.columns.mapping". Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples.
> There is also a table property "hbase.table.default.storage.type" = "string" to specify a table level default storage type. The other valid specification is "binary". The table level default is overridden by a column level specification.
> This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key.
> Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below.
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties ("hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.691 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	NULL	NULL	NULL	NULL	NULL	Test-String	NULL	NULL
> Time taken: 0.346 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.139 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" )
>     >  tblproperties (
>     >  "hbase.table.name" = "TestHiveHBaseExternalTable",
>     >  "hbase.table.default.storage.type" = "string");
> OK
> Time taken: 0.139 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.151 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.154 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,-,b,b" )
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.347 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.245 seconds
> hive> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira