You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "liviu (JIRA)" <ji...@apache.org> on 2017/09/05 06:04:00 UTC
[jira] [Created] (HIVE-17451) Cannot read decimal from avro file created with HIVE

liviu created HIVE-17451:
----------------------------

             Summary: Cannot read decimal from avro file created with HIVE
                 Key: HIVE-17451
                 URL: https://issues.apache.org/jira/browse/HIVE-17451
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 1.1.0
            Reporter: liviu
            Priority: Blocker


Hi,

When we export decimal data from a hive managed table to a hive avro external table (as bytes with decimal logicalType) the value from avro file cannot be read with any other tools (ex: avro-tools, spark, datastage..)

_+Scenario:+_

*create hive managed table an insert a decimal record:*

{code:java}
create table test_decimal (col1 decimal(20,2));
insert into table test_decimal values (3.12);
{code}


*create avro schema /tmp/test_decimal.avsc with below content:*

{code:java}
{
  "type" : "record",
  "name" : "decimal_test_avro",
  "fields" : [ {
    "name" : "col1",
    "type" : [ "null", {
      "type" : "bytes",
      "logicalType" : "decimal",
      "precision" : 20,
      "scale" : 2
    } ],
    "default" : null,
    "columnName" : "col1",
    "sqlType" : "2"
  }],
  "tableName" : "decimal_test_avro"
}
{code}


*create an hive external table stored as avro:*

{code:java}
create external table test_decimal_avro
STORED AS AVRO
LOCATION '/tmp/test_decimal'
TBLPROPERTIES (
  'avro.schema.url'='/tmp/test_decimal.avsc',
  'orc.compress'='SNAPPY');
{code}


*insert data in avro external table from hive managed table:*

{code:java}
set hive.exec.compress.output=true;
set hive.exec.compress.intermediate=true;
set avro.output.codec=snappy; 
insert overwrite table test_decimal_avro select * from test_decimal;
{code}


*successfully reading data from hive avro table through hive cli:*

{code:java}
select * from test_decimal_avro;
OK
3.12
{code}


*avro schema from avro created file is ok:*

{code:java}
hadoop jar /avro-tools.jar getschema /tmp/test_decimal/000000_0
{
  "type" : "record",
  "name" : "decimal_test_avro",
  "fields" : [ {
    "name" : "col1",
    "type" : [ "null", {
      "type" : "bytes",
      "logicalType" : "decimal",
      "precision" : 20,
      "scale" : 2
    } ],
    "default" : null,
    "columnName" : "col1",
    "sqlType" : "2"
  } ],
  "tableName" : "decimal_test_avro"
}
{code}


*read data from avro file with avro-tools {color:#d04437}error{color}, got {color:#d04437}"\u00018"{color} value instead of the correct one:*

{code:java}
hadoop jar avro-tools.jar tojson /tmp/test_decimal/000000_0
{"col1":{"bytes":"\u00018"}}
{code}


*Read data in a spark dataframe error, got {color:#d04437}[01 38]{color} and{color:#d04437} 8{color} when converted to string instead of correct "3.12" value :*

{code:java}

val df = sql.read.avro("/tmp/test_decimal")
df: org.apache.spark.sql.DataFrame = [col1: binary]

scala> df.show()
+-------+
|   col1|
+-------+
|[01 38]|
+-------+


scala> df.withColumn("col2", 'col1.cast("String")).select("col2").show()
+----+
|col2|
+----+
|  8|
+----+

{code}


Is this a Hive bug or there is anything else I can do in order to get correct values in the avro file created by Hive?

Thanks,



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)