You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Ilya Kats (JIRA)" <ji...@apache.org> on 2016/01/29 17:26:40 UTC

[jira] [Commented] (HIVE-6147) Support avro data stored in HBase columns

    [ https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123690#comment-15123690 ] 

Ilya Kats commented on HIVE-6147:
---------------------------------

I'm trying to create a table in Hive 0.14 that points to an HBase table with one column family ("c") and one column ("b") that contains schema-less avro serialized object:
{code:sql}
CREATE EXTERNAL TABLE customers
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES (
  "hbase.columns.mapping" = ":key,c:b", 
  "c.b.serialization.type"="avro", 
  "c.b.avro.schema.url"="hdfs:/....../Customer.avsc") 
TBLPROPERTIES ("hbase.table.name" = "customers", "hbase.struct.autogenerate"="true", "hive.serialization.extend.nesting.levels"="true");
{code}

The DDL above creates the table successfully, but queries fail with the following error:
{code}
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating c_b
16/01/29 15:36:55 [main]: ERROR CliDriver: Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating c_b
java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating c_b
	at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:152)
	at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1621)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating c_b
	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:82)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
	at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:571)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:563)
	at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
	... 12 more
Caused by: org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorException: An error occurred retrieving schema from bytes
	at org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.retrieveSchemaFromBytes(AvroLazyObjectInspector.java:331)
	at org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.deserializeStruct(AvroLazyObjectInspector.java:287)
	at org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.getStructFieldData(AvroLazyObjectInspector.java:142)
	at org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:109)
	at org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldData(DelegatedStructObjectInspector.java:88)
	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
	at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
	at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77)
	... 17 more
Caused by: java.io.IOException: Not a data file.
	at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
	at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84)
	at org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.retrieveSchemaFromBytes(AvroLazyObjectInspector.java:328)
	... 25 more
{code}

It seems that there is a problem in the following code in AvroLazyObjectInspector:
{code}
...
private Object deserializeStruct(Object struct, String fieldName) {
...
if (readerSchema == null) {
...
} else {
      // a reader schema was provided
      if (schemaRetriever != null) {
        // a schema retriever has been provided as well. Attempt to read the write schema from the
        // retriever
        ws = schemaRetriever.retrieveWriterSchema(data);

        if (ws == null) {
          throw new IllegalStateException(
              "Null writer schema retrieved from schemaRetriever for field [" + fieldName + "]");
        }
      } else {
        // attempt retrieving the schema from the data
        ws = retrieveSchemaFromBytes(data);   
      }

      rs = readerSchema;

      try {
        avroWritable.readFields(data, ws, rs);
      } catch (IOException ioe) {
        throw new AvroObjectInspectorException("Error deserializing avro payload", ioe);
      }
    }
...
}
...
{code}
because it tries to retrieve the write schema from data ({{ws = retrieveSchemaFromBytes(data)}}) even if the schema URL (reader schema) had been provided. Is there way to make it work for schema-less avro data?  


> Support avro data stored in HBase columns
> -----------------------------------------
>
>                 Key: HIVE-6147
>                 URL: https://issues.apache.org/jira/browse/HIVE-6147
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>    Affects Versions: 0.12.0, 0.13.0
>            Reporter: Swarnim Kulkarni
>            Assignee: Swarnim Kulkarni
>              Labels: TODOC14
>             Fix For: 0.14.0
>
>         Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt, HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt
>
>
> Presently, the HBase Hive integration supports querying only primitive data types in columns. It would be nice to be able to store and query Avro objects in HBase columns by making them visible as structs to Hive. This will allow Hive to perform ad hoc analysis of HBase data which can be deeply structured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)