You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "liviu (JIRA)" <ji...@apache.org> on 2017/09/05 06:04:00 UTC
[jira] [Created] (HIVE-17451) Cannot read decimal from avro file
created with HIVE
liviu created HIVE-17451:
----------------------------
Summary: Cannot read decimal from avro file created with HIVE
Key: HIVE-17451
URL: https://issues.apache.org/jira/browse/HIVE-17451
Project: Hive
Issue Type: Bug
Components: Hive
Affects Versions: 1.1.0
Reporter: liviu
Priority: Blocker
Hi,
When we export decimal data from a hive managed table to a hive avro external table (as bytes with decimal logicalType) the value from avro file cannot be read with any other tools (ex: avro-tools, spark, datastage..)
_+Scenario:+_
*create hive managed table an insert a decimal record:*
{code:java}
create table test_decimal (col1 decimal(20,2));
insert into table test_decimal values (3.12);
{code}
*create avro schema /tmp/test_decimal.avsc with below content:*
{code:java}
{
"type" : "record",
"name" : "decimal_test_avro",
"fields" : [ {
"name" : "col1",
"type" : [ "null", {
"type" : "bytes",
"logicalType" : "decimal",
"precision" : 20,
"scale" : 2
} ],
"default" : null,
"columnName" : "col1",
"sqlType" : "2"
}],
"tableName" : "decimal_test_avro"
}
{code}
*create an hive external table stored as avro:*
{code:java}
create external table test_decimal_avro
STORED AS AVRO
LOCATION '/tmp/test_decimal'
TBLPROPERTIES (
'avro.schema.url'='/tmp/test_decimal.avsc',
'orc.compress'='SNAPPY');
{code}
*insert data in avro external table from hive managed table:*
{code:java}
set hive.exec.compress.output=true;
set hive.exec.compress.intermediate=true;
set avro.output.codec=snappy;
insert overwrite table test_decimal_avro select * from test_decimal;
{code}
*successfully reading data from hive avro table through hive cli:*
{code:java}
select * from test_decimal_avro;
OK
3.12
{code}
*avro schema from avro created file is ok:*
{code:java}
hadoop jar /avro-tools.jar getschema /tmp/test_decimal/000000_0
{
"type" : "record",
"name" : "decimal_test_avro",
"fields" : [ {
"name" : "col1",
"type" : [ "null", {
"type" : "bytes",
"logicalType" : "decimal",
"precision" : 20,
"scale" : 2
} ],
"default" : null,
"columnName" : "col1",
"sqlType" : "2"
} ],
"tableName" : "decimal_test_avro"
}
{code}
*read data from avro file with avro-tools {color:#d04437}error{color}, got {color:#d04437}"\u00018"{color} value instead of the correct one:*
{code:java}
hadoop jar avro-tools.jar tojson /tmp/test_decimal/000000_0
{"col1":{"bytes":"\u00018"}}
{code}
*Read data in a spark dataframe error, got {color:#d04437}[01 38]{color} and{color:#d04437} 8{color} when converted to string instead of correct "3.12" value :*
{code:java}
val df = sql.read.avro("/tmp/test_decimal")
df: org.apache.spark.sql.DataFrame = [col1: binary]
scala> df.show()
+-------+
| col1|
+-------+
|[01 38]|
+-------+
scala> df.withColumn("col2", 'col1.cast("String")).select("col2").show()
+----+
|col2|
+----+
| 8|
+----+
{code}
Is this a Hive bug or there is anything else I can do in order to get correct values in the avro file created by Hive?
Thanks,
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)