You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Ganesh Tripathi (JIRA)" <ji...@apache.org> on 2017/09/15 12:05:00 UTC
[jira] [Assigned] (HIVE-17451) Cannot read decimal from avro file created with HIVE

     [ https://issues.apache.org/jira/browse/HIVE-17451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ganesh Tripathi reassigned HIVE-17451:
--------------------------------------

    Assignee: Ganesh Tripathi

> Cannot read decimal from avro file created with HIVE
> ----------------------------------------------------
>
>                 Key: HIVE-17451
>                 URL: https://issues.apache.org/jira/browse/HIVE-17451
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.1.0
>            Reporter: liviu
>            Assignee: Ganesh Tripathi
>            Priority: Blocker
>
> Hi,
> When we export decimal data from a hive managed table to a hive avro external table (as bytes with decimal logicalType) the value from avro file cannot be read with any other tools (ex: avro-tools, spark, datastage..)
> _+Scenario:+_
> *create hive managed table an insert a decimal record:*
> {code:java}
> create table test_decimal (col1 decimal(20,2));
> insert into table test_decimal values (3.12);
> {code}
> *create avro schema /tmp/test_decimal.avsc with below content:*
> {code:java}
> {
>   "type" : "record",
>   "name" : "decimal_test_avro",
>   "fields" : [ {
>     "name" : "col1",
>     "type" : [ "null", {
>       "type" : "bytes",
>       "logicalType" : "decimal",
>       "precision" : 20,
>       "scale" : 2
>     } ],
>     "default" : null,
>     "columnName" : "col1",
>     "sqlType" : "2"
>   }],
>   "tableName" : "decimal_test_avro"
> }
> {code}
> *create an hive external table stored as avro:*
> {code:java}
> create external table test_decimal_avro
> STORED AS AVRO
> LOCATION '/tmp/test_decimal'
> TBLPROPERTIES (
>   'avro.schema.url'='/tmp/test_decimal.avsc',
>   'orc.compress'='SNAPPY');
> {code}
> *insert data in avro external table from hive managed table:*
> {code:java}
> set hive.exec.compress.output=true;
> set hive.exec.compress.intermediate=true;
> set avro.output.codec=snappy; 
> insert overwrite table test_decimal_avro select * from test_decimal;
> {code}
> *successfully reading data from hive avro table through hive cli:*
> {code:java}
> select * from test_decimal_avro;
> OK
> 3.12
> {code}
> *avro schema from avro created file is ok:*
> {code:java}
> hadoop jar /avro-tools.jar getschema /tmp/test_decimal/000000_0
> {
>   "type" : "record",
>   "name" : "decimal_test_avro",
>   "fields" : [ {
>     "name" : "col1",
>     "type" : [ "null", {
>       "type" : "bytes",
>       "logicalType" : "decimal",
>       "precision" : 20,
>       "scale" : 2
>     } ],
>     "default" : null,
>     "columnName" : "col1",
>     "sqlType" : "2"
>   } ],
>   "tableName" : "decimal_test_avro"
> }
> {code}
> *read data from avro file with avro-tools {color:#d04437}error{color}, got {color:#d04437}"\u00018"{color} value instead of the correct one:*
> {code:java}
> hadoop jar avro-tools.jar tojson /tmp/test_decimal/000000_0
> {"col1":{"bytes":"\u00018"}}
> {code}
> *Read data in a spark dataframe error, got {color:#d04437}[01 38]{color} and{color:#d04437} 8{color} when converted to string instead of correct "3.12" value :*
> {code:java}
> val df = sql.read.avro("/tmp/test_decimal")
> df: org.apache.spark.sql.DataFrame = [col1: binary]
> scala> df.show()
> +-------+
> |   col1|
> +-------+
> |[01 38]|
> +-------+
> scala> df.withColumn("col2", 'col1.cast("String")).select("col2").show()
> +----+
> |col2|
> +----+
> |  8|
> +----+
> {code}
> Is this a Hive bug or there is anything else I can do in order to get correct values in the avro file created by Hive?
> Thanks,



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)