You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Grant Henke (JIRA)" <ji...@apache.org> on 2018/05/24 21:36:00 UTC
[jira] [Updated] (KUDU-2454) Avro Import/Export does not round trip
[ https://issues.apache.org/jira/browse/KUDU-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Henke updated KUDU-2454:
------------------------------
Affects Version/s: 1.5.0
> Avro Import/Export does not round trip
> --------------------------------------
>
> Key: KUDU-2454
> URL: https://issues.apache.org/jira/browse/KUDU-2454
> Project: Kudu
> Issue Type: Bug
> Affects Versions: 1.5.0
> Reporter: Grant Henke
> Priority: Critical
>
> When exporting to Avro columns with type Byte or Short are treated as Integers because Avro doesn't have a Byte or Short type. When re-importing the data, the job fails because the column types do not match.
> Ideally spark-avro would solve this by safely casting the values back to the smaller type. Guava has utilities to make this straightforward. (ex. Shorts.checkedCast(i)). We could send a pull request to spark-avro to fix this, or add some special handling to the Kudu side to handle the safe downconversion.
> Another type issue when exporting is that Decimal values are written as Strings instead of BigDecimal logical types. There are a few un-merged pull request to fix that here:
> * [https://github.com/databricks/spark-avro/pull/276]
> * [https://github.com/databricks/spark-avro/pull/121]
> Additionally Timestamp values are written as longs instead of Timestamp logical types (timestamp-micros). This is a data corruption issue because the long [value that is output|https://github.com/databricks/spark-avro/blob/0764d699015975acf87dc5210cca8a43db84196a/src/main/scala/com/databricks/spark/avro/AvroOutputWriter.scala#L103] is in milliseconds (Timestamp.getTime()) but the expected long value for a Kudu Timestamp column should be in microseconds.
> Given all these issues, ImportExportFiles needs a lot more test coverage before we suggest it's use. Currently it only tests importing Strings form a CSV and does not test Avro or parquet support.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)