You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gengliangwang <gi...@git.apache.org> on 2018/08/08 13:40:49 UTC
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
GitHub user gengliangwang opened a pull request:
https://github.com/apache/spark/pull/22037
[SPARK-24774][SQL] Avro: Support logical decimal type
## What changes were proposed in this pull request?
Support Avro logical date type:
https://avro.apache.org/docs/1.8.2/spec.html#Decimal
## How was this patch tested?
Unit test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gengliangwang/spark avro_decimal
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22037.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22037
----
commit bbc54342e64428d96e4beb0701831ff471f991e7
Author: Gengliang Wang <ge...@...>
Date: 2018-08-08T13:28:49Z
avro decimal type
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94546 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94546/testReport)** for PR 22037 at commit [`9eefbe5`](https://github.com/apache/spark/commit/9eefbe5dc58bba272dedce7ae0174be89a0a9b28).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r208623546
--- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala ---
@@ -475,6 +498,41 @@ class AvroSuite extends QueryTest with SharedSQLContext with SQLTestUtils {
checkAnswer(df, expected)
}
+ test("Logical type: Decimal") {
+ val expected = Seq((1.23, 45.67), (65.37, 81.39))
+ .map { d =>
+ Row(new java.math.BigDecimal(d._1.toString), new java.math.BigDecimal(d._2.toString))
+ }
+ val df = spark.read.format("avro").load(decimalAvro)
+
+ checkAnswer(df, expected)
+
+ val avroSchema = s"""
+ {
+ "namespace": "logical",
+ "type": "record",
+ "name": "test",
+ "fields": [
+ {"name": "bytes", "type":
+ {"type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2}
+ },
+ {"name": "fixed", "type":
+ {"type": "fixed", "size": 5, "logicalType": "decimal",
+ "precision": 4, "scale": 2, "name": "foo"}
--- End diff --
Here, should I prettify the JSON string? I didn't want to make the function too long here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94579/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:
https://github.com/apache/spark/pull/22037
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94519 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94519/testReport)** for PR 22037 at commit [`1c5d228`](https://github.com/apache/spark/commit/1c5d228a05af6bab0c89a432b5446647746f117d).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94616/testReport)** for PR 22037 at commit [`0384a1a`](https://github.com/apache/spark/commit/0384a1af69573af317f9e644bcf04e12bf38f1f3).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94623 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94623/testReport)** for PR 22037 at commit [`6b11292`](https://github.com/apache/spark/commit/6b11292a319dad052590e15aaec097ff636bff7f).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94579 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94579/testReport)** for PR 22037 at commit [`0384a1a`](https://github.com/apache/spark/commit/0384a1af69573af317f9e644bcf04e12bf38f1f3).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2015/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94616/testReport)** for PR 22037 at commit [`0384a1a`](https://github.com/apache/spark/commit/0384a1af69573af317f9e644bcf04e12bf38f1f3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r209412578
--- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala ---
@@ -494,6 +522,68 @@ class AvroSuite extends QueryTest with SharedSQLContext with SQLTestUtils {
checkAnswer(df, expected)
}
+ test("Logical type: Decimal") {
+ val expected = Seq("1.23", "4.56", "78.90", "-1", "-2.31")
+ .map { x => Row(new java.math.BigDecimal(x), new java.math.BigDecimal(x)) }
+ val df = spark.read.format("avro").load(decimalAvro)
+
+ checkAnswer(df, expected)
+
+ val avroSchema = s"""
+ {
+ "namespace": "logical",
+ "type": "record",
+ "name": "test",
+ "fields": [
+ {"name": "bytes", "type":
+ {"type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2}
+ },
+ {"name": "fixed", "type":
+ {"type": "fixed", "size": 5, "logicalType": "decimal",
+ "precision": 4, "scale": 2, "name": "foo"}
+ }
+ ]
+ }
+ """
+
+ checkAnswer(spark.read.format("avro").option("avroSchema", avroSchema).load(decimalAvro),
+ expected)
+
+ withTempPath { dir =>
+ df.write.format("avro").save(dir.toString)
+ checkAnswer(spark.read.format("avro").load(dir.toString), expected)
+ }
+ }
+
+ test("Logical type: Decimal with too large precision") {
+ withTempDir { dir =>
+ val schema = new Schema.Parser().parse("""{
+ "namespace": "logical",
+ "type": "record",
+ "name": "test",
+ "fields": [{
+ "name": "decimal",
+ "type": {"type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2}
+ }]
+ }""")
+ val datumWriter = new GenericDatumWriter[GenericRecord](schema)
+ val dataFileWriter = new DataFileWriter[GenericRecord](datumWriter)
+ dataFileWriter.create(schema, new File(s"$dir.avro"))
--- End diff --
Let's either always use python to write test files, or always use java.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22037
thanks, merging to master!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94558 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94558/testReport)** for PR 22037 at commit [`24dbada`](https://github.com/apache/spark/commit/24dbada0823e47b50892a34d19e1b8e2a63af7c3).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r209129230
--- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala ---
@@ -138,10 +142,21 @@ class AvroDeserializer(rootAvroType: Schema, rootCatalystType: DataType) {
bytes
case b: Array[Byte] => b
case other => throw new RuntimeException(s"$other is not a valid avro binary.")
-
}
updater.set(ordinal, bytes)
+ case (FIXED, d: DecimalType) => (updater, ordinal, value) =>
+ val bigDecimal = decimalConversions.fromFixed(value.asInstanceOf[GenericFixed], avroType,
+ LogicalTypes.decimal(d.precision, d.scale))
--- End diff --
parquet can convert binary to unscaled long directly, shall we follow?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94552/testReport)** for PR 22037 at commit [`24dbada`](https://github.com/apache/spark/commit/24dbada0823e47b50892a34d19e1b8e2a63af7c3).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1945/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r209152562
--- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala ---
@@ -138,10 +142,21 @@ class AvroDeserializer(rootAvroType: Schema, rootCatalystType: DataType) {
bytes
case b: Array[Byte] => b
case other => throw new RuntimeException(s"$other is not a valid avro binary.")
-
}
updater.set(ordinal, bytes)
+ case (FIXED, d: DecimalType) => (updater, ordinal, value) =>
+ val bigDecimal = decimalConversions.fromFixed(value.asInstanceOf[GenericFixed], avroType,
+ LogicalTypes.decimal(d.precision, d.scale))
--- End diff --
Comparing to `binaryToUnscaledLong`, I think using the method from Avro library makes more sense.
Also the method `binaryToUnscaledLong` is using the underlying byte array of parquet Binary without copying it. (If we create a new Util method for both, then Parquet data source will lose this optimization.)
For performance consideration, we can create a similar method in Avro. I tried the function `binaryToUnscaledLong` in Avro and it works. I can change it if you insist.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r209417216
--- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala ---
@@ -114,32 +129,35 @@ object SchemaConverters {
prevNameSpace: String = "",
outputTimestampType: AvroOutputTimestampType.Value = AvroOutputTimestampType.TIMESTAMP_MICROS)
: Schema = {
- val builder = if (nullable) {
- SchemaBuilder.builder().nullable()
- } else {
- SchemaBuilder.builder()
- }
+ val builder = SchemaBuilder.builder()
- catalystType match {
+ val schema = catalystType match {
case BooleanType => builder.booleanType()
case ByteType | ShortType | IntegerType => builder.intType()
case LongType => builder.longType()
- case DateType => builder
- .intBuilder()
- .prop(LogicalType.LOGICAL_TYPE_PROP, LogicalTypes.date().getName)
- .endInt()
+ case DateType =>
+ LogicalTypes.date().addToSchema(builder.intType())
case TimestampType =>
val timestampType = outputTimestampType match {
case AvroOutputTimestampType.TIMESTAMP_MILLIS => LogicalTypes.timestampMillis()
case AvroOutputTimestampType.TIMESTAMP_MICROS => LogicalTypes.timestampMicros()
case other =>
throw new IncompatibleSchemaException(s"Unexpected output timestamp type $other.")
}
- builder.longBuilder().prop(LogicalType.LOGICAL_TYPE_PROP, timestampType.getName).endLong()
+ timestampType.addToSchema(builder.longType())
case FloatType => builder.floatType()
case DoubleType => builder.doubleType()
- case _: DecimalType | StringType => builder.stringType()
+ case StringType => builder.stringType()
+ case d: DecimalType =>
+ val avroType = LogicalTypes.decimal(d.precision, d.scale)
+ val fixedSize = minBytesForPrecision(d.precision)
+ // Use random name to avoid conflict in naming of fixed field.
+ // Field names must start with [A-Za-z_], while the charset of Random.alphanumeric contains
+ // [0-9]. So add a single character "f" to ensure the name is valid.
+ val name = "f" + Random.alphanumeric.take(32).mkString("")
--- End diff --
No, if there are two decimal fields, then there will be name conflict. I tried.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94623/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94428 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94428/testReport)** for PR 22037 at commit [`e9dc019`](https://github.com/apache/spark/commit/e9dc01960039c6689c17b3552d7207ccd1acc14d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2084/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2033/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r208624572
--- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala ---
@@ -138,10 +138,24 @@ class AvroDeserializer(rootAvroType: Schema, rootCatalystType: DataType) {
bytes
case b: Array[Byte] => b
case other => throw new RuntimeException(s"$other is not a valid avro binary.")
-
}
updater.set(ordinal, bytes)
+ case (FIXED, _: DecimalType) => (updater, ordinal, value) =>
+ val decimal = Decimal(value.asInstanceOf[GenericFixed].bytes())
--- End diff --
shall we call `decimal.toPrecision(...)`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94538/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94538/testReport)** for PR 22037 at commit [`1c5d228`](https://github.com/apache/spark/commit/1c5d228a05af6bab0c89a432b5446647746f117d).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94538/testReport)** for PR 22037 at commit [`1c5d228`](https://github.com/apache/spark/commit/1c5d228a05af6bab0c89a432b5446647746f117d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94552/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r209043142
--- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala ---
@@ -139,7 +142,16 @@ object SchemaConverters {
case FloatType => builder.floatType()
case DoubleType => builder.doubleType()
- case _: DecimalType | StringType => builder.stringType()
+ case StringType => builder.stringType()
+ case d: DecimalType =>
+ val avroType = LogicalTypes.decimal(d.precision, d.scale)
+ if (nullable) {
+ val schema = avroType.addToSchema(SchemaBuilder.builder().bytesType())
+ builder.`type`(schema)
--- End diff --
No, the keys are private. Also there is no method like `fixedBuilder` (like `builder.longBuilder().prop(...)`)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94427/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94558 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94558/testReport)** for PR 22037 at commit [`24dbada`](https://github.com/apache/spark/commit/24dbada0823e47b50892a34d19e1b8e2a63af7c3).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94623 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94623/testReport)** for PR 22037 at commit [`6b11292`](https://github.com/apache/spark/commit/6b11292a319dad052590e15aaec097ff636bff7f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94519 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94519/testReport)** for PR 22037 at commit [`1c5d228`](https://github.com/apache/spark/commit/1c5d228a05af6bab0c89a432b5446647746f117d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94546 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94546/testReport)** for PR 22037 at commit [`9eefbe5`](https://github.com/apache/spark/commit/9eefbe5dc58bba272dedce7ae0174be89a0a9b28).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94428 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94428/testReport)** for PR 22037 at commit [`e9dc019`](https://github.com/apache/spark/commit/e9dc01960039c6689c17b3552d7207ccd1acc14d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r209083275
--- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala ---
@@ -139,7 +152,22 @@ object SchemaConverters {
case FloatType => builder.floatType()
case DoubleType => builder.doubleType()
- case _: DecimalType | StringType => builder.stringType()
+ case StringType => builder.stringType()
+ case d: DecimalType =>
+ val avroType = LogicalTypes.decimal(d.precision, d.scale)
+ val fixedSize = minBytesForPrecision(d.precision)
+ // Use random name to avoid conflict in naming of fixed field.
+ // Field names must start with [A-Za-z_], while the charset of Random.alphanumeric contains
+ // [0-9]. So add a single character "f" to ensure the name is valid.
+ val name = "f" + Random.alphanumeric.take(32).mkString("")
+ if (nullable) {
+ val schema = avroType.addToSchema(
+ SchemaBuilder.builder().fixed(name).size(fixedSize))
+ builder.`type`(schema)
+ } else {
+ avroType.addToSchema(builder.fixed(name).size(fixedSize))
--- End diff --
Why we don't need call `type` as above?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94428/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94628 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94628/testReport)** for PR 22037 at commit [`508a340`](https://github.com/apache/spark/commit/508a340d0b3d85d062a519490e79978b123c73ac).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/22037
retest this please.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22037
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r208627179
--- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala ---
@@ -139,7 +142,16 @@ object SchemaConverters {
case FloatType => builder.floatType()
case DoubleType => builder.doubleType()
- case _: DecimalType | StringType => builder.stringType()
+ case StringType => builder.stringType()
+ case d: DecimalType =>
+ val avroType = LogicalTypes.decimal(d.precision, d.scale)
+ if (nullable) {
+ val schema = avroType.addToSchema(SchemaBuilder.builder().bytesType())
+ builder.`type`(schema)
--- End diff --
are the precision and scale prop key public? If they are, we should still set the props. It's more important to keep the code consistent.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2056/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r209140501
--- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala ---
@@ -139,7 +152,22 @@ object SchemaConverters {
case FloatType => builder.floatType()
case DoubleType => builder.doubleType()
- case _: DecimalType | StringType => builder.stringType()
+ case StringType => builder.stringType()
+ case d: DecimalType =>
+ val avroType = LogicalTypes.decimal(d.precision, d.scale)
+ val fixedSize = minBytesForPrecision(d.precision)
+ // Use random name to avoid conflict in naming of fixed field.
+ // Field names must start with [A-Za-z_], while the charset of Random.alphanumeric contains
+ // [0-9]. So add a single character "f" to ensure the name is valid.
+ val name = "f" + Random.alphanumeric.take(32).mkString("")
+ if (nullable) {
+ val schema = avroType.addToSchema(
+ SchemaBuilder.builder().fixed(name).size(fixedSize))
+ builder.`type`(schema)
+ } else {
+ avroType.addToSchema(builder.fixed(name).size(fixedSize))
--- End diff --
Here we can add the schema to `builder` directly. If the builder is nullable, we need to create schema with logical type and then add it to the nullable builder (complete the type as `union` with `null`)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2081/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94427 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94427/testReport)** for PR 22037 at commit [`bbc5434`](https://github.com/apache/spark/commit/bbc54342e64428d96e4beb0701831ff471f991e7).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94628/testReport)** for PR 22037 at commit [`508a340`](https://github.com/apache/spark/commit/508a340d0b3d85d062a519490e79978b123c73ac).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94579/testReport)** for PR 22037 at commit [`0384a1a`](https://github.com/apache/spark/commit/0384a1af69573af317f9e644bcf04e12bf38f1f3).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r209162410
--- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala ---
@@ -138,10 +142,21 @@ class AvroDeserializer(rootAvroType: Schema, rootCatalystType: DataType) {
bytes
case b: Array[Byte] => b
case other => throw new RuntimeException(s"$other is not a valid avro binary.")
-
}
updater.set(ordinal, bytes)
+ case (FIXED, d: DecimalType) => (updater, ordinal, value) =>
+ val bigDecimal = decimalConversions.fromFixed(value.asInstanceOf[GenericFixed], avroType,
+ LogicalTypes.decimal(d.precision, d.scale))
--- End diff --
ok let's leave it. We can always add later.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94519/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2040/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r209067476
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -479,6 +481,26 @@ object Decimal {
dec
}
+ // Max precision of a decimal value stored in `numBytes` bytes
+ def maxPrecisionForBytes(numBytes: Int): Int = {
+ Math.round( // convert double to long
+ Math.floor(Math.log10( // number of base-10 digits
+ Math.pow(2, 8 * numBytes - 1) - 1))) // max value stored in numBytes
+ .asInstanceOf[Int]
+ }
+
+ // Returns the minimum number of bytes needed to store a decimal with a given `precision`.
+ lazy val minBytesForPrecision = Array.tabulate[Int](39)(computeMinBytesForPrecision)
+
+
--- End diff --
nit: seems an extra blank line here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1946/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2025/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2038/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r208699939
--- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -86,7 +86,8 @@ class AvroSerializer(rootCatalystType: DataType, rootAvroType: Schema, nullable:
case DoubleType =>
(getter, ordinal) => getter.getDouble(ordinal)
case d: DecimalType =>
- (getter, ordinal) => getter.getDecimal(ordinal, d.precision, d.scale).toString
+ (getter, ordinal) =>
+ ByteBuffer.wrap(getter.getDecimal(ordinal, d.precision, d.scale).toString.getBytes)
--- End diff --
I will update it to `decimal.toJavaBigDecimal.unscaledValue().toByteArray`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22037
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r208591866
--- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala ---
@@ -139,7 +142,16 @@ object SchemaConverters {
case FloatType => builder.floatType()
case DoubleType => builder.doubleType()
- case _: DecimalType | StringType => builder.stringType()
+ case StringType => builder.stringType()
+ case d: DecimalType =>
+ val avroType = LogicalTypes.decimal(d.precision, d.scale)
+ if (nullable) {
+ val schema = avroType.addToSchema(SchemaBuilder.builder().bytesType())
+ builder.`type`(schema)
--- End diff --
For decimal, it requires to set three keys in props: `logicalType`, `precision`, `scale`.
So here I choose to use such way, instead of setting properties like logical Timestamp/Date type.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94558/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94628/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r209130685
--- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala ---
@@ -475,6 +498,41 @@ class AvroSuite extends QueryTest with SharedSQLContext with SQLTestUtils {
checkAnswer(df, expected)
}
+ test("Logical type: Decimal") {
+ val expected = Seq((1.23, 45.67), (65.37, 81.39))
+ .map { d =>
+ Row(new java.math.BigDecimal(d._1.toString), new java.math.BigDecimal(d._2.toString))
+ }
+ val df = spark.read.format("avro").load(decimalAvro)
+
+ checkAnswer(df, expected)
+
+ val avroSchema = s"""
+ {
+ "namespace": "logical",
+ "type": "record",
+ "name": "test",
+ "fields": [
+ {"name": "bytes", "type":
+ {"type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2}
+ },
+ {"name": "fixed", "type":
+ {"type": "fixed", "size": 5, "logicalType": "decimal",
+ "precision": 4, "scale": 2, "name": "foo"}
--- End diff --
One option might be to use json4s and convert it to JSON string.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94552 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94552/testReport)** for PR 22037 at commit [`24dbada`](https://github.com/apache/spark/commit/24dbada0823e47b50892a34d19e1b8e2a63af7c3).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r208660814
--- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala ---
@@ -139,7 +142,16 @@ object SchemaConverters {
case FloatType => builder.floatType()
case DoubleType => builder.doubleType()
- case _: DecimalType | StringType => builder.stringType()
+ case StringType => builder.stringType()
+ case d: DecimalType =>
+ val avroType = LogicalTypes.decimal(d.precision, d.scale)
+ if (nullable) {
+ val schema = avroType.addToSchema(SchemaBuilder.builder().bytesType())
--- End diff --
no, in the next line `builder` is nullable.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r208626202
--- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala ---
@@ -139,7 +142,16 @@ object SchemaConverters {
case FloatType => builder.floatType()
case DoubleType => builder.doubleType()
- case _: DecimalType | StringType => builder.stringType()
+ case StringType => builder.stringType()
+ case d: DecimalType =>
+ val avroType = LogicalTypes.decimal(d.precision, d.scale)
+ if (nullable) {
+ val schema = avroType.addToSchema(SchemaBuilder.builder().bytesType())
--- End diff --
do we loss nullable information here?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94616/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r209127634
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -455,6 +455,8 @@ object Decimal {
def apply(unscaled: Long, precision: Int, scale: Int): Decimal =
new Decimal().set(unscaled, precision, scale)
+ def apply(value: Array[Byte]): Decimal = Decimal(value.map(_.toChar).mkString)
--- End diff --
why do we need it?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22037
**[Test build #94427 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94427/testReport)** for PR 22037 at commit [`bbc5434`](https://github.com/apache/spark/commit/bbc54342e64428d96e4beb0701831ff471f991e7).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2088/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r209412587
--- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala ---
@@ -114,32 +129,35 @@ object SchemaConverters {
prevNameSpace: String = "",
outputTimestampType: AvroOutputTimestampType.Value = AvroOutputTimestampType.TIMESTAMP_MICROS)
: Schema = {
- val builder = if (nullable) {
- SchemaBuilder.builder().nullable()
- } else {
- SchemaBuilder.builder()
- }
+ val builder = SchemaBuilder.builder()
- catalystType match {
+ val schema = catalystType match {
case BooleanType => builder.booleanType()
case ByteType | ShortType | IntegerType => builder.intType()
case LongType => builder.longType()
- case DateType => builder
- .intBuilder()
- .prop(LogicalType.LOGICAL_TYPE_PROP, LogicalTypes.date().getName)
- .endInt()
+ case DateType =>
+ LogicalTypes.date().addToSchema(builder.intType())
case TimestampType =>
val timestampType = outputTimestampType match {
case AvroOutputTimestampType.TIMESTAMP_MILLIS => LogicalTypes.timestampMillis()
case AvroOutputTimestampType.TIMESTAMP_MICROS => LogicalTypes.timestampMicros()
case other =>
throw new IncompatibleSchemaException(s"Unexpected output timestamp type $other.")
}
- builder.longBuilder().prop(LogicalType.LOGICAL_TYPE_PROP, timestampType.getName).endLong()
+ timestampType.addToSchema(builder.longType())
case FloatType => builder.floatType()
case DoubleType => builder.doubleType()
- case _: DecimalType | StringType => builder.stringType()
+ case StringType => builder.stringType()
+ case d: DecimalType =>
+ val avroType = LogicalTypes.decimal(d.precision, d.scale)
+ val fixedSize = minBytesForPrecision(d.precision)
+ // Use random name to avoid conflict in naming of fixed field.
+ // Field names must start with [A-Za-z_], while the charset of Random.alphanumeric contains
+ // [0-9]. So add a single character "f" to ensure the name is valid.
+ val name = "f" + Random.alphanumeric.take(32).mkString("")
--- End diff --
can we use `recordName` here?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94546/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r208655309
--- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala ---
@@ -138,10 +138,24 @@ class AvroDeserializer(rootAvroType: Schema, rootCatalystType: DataType) {
bytes
case b: Array[Byte] => b
case other => throw new RuntimeException(s"$other is not a valid avro binary.")
-
}
updater.set(ordinal, bytes)
+ case (FIXED, _: DecimalType) => (updater, ordinal, value) =>
+ val decimal = Decimal(value.asInstanceOf[GenericFixed].bytes())
--- End diff --
No, I don't think we need. Normally Avro records is validated with the schema before written. So the precision of input here is supposed be valid.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22037#discussion_r208625409
--- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -86,7 +86,8 @@ class AvroSerializer(rootCatalystType: DataType, rootAvroType: Schema, nullable:
case DoubleType =>
(getter, ordinal) => getter.getDouble(ordinal)
case d: DecimalType =>
- (getter, ordinal) => getter.getDecimal(ordinal, d.precision, d.scale).toString
+ (getter, ordinal) =>
+ ByteBuffer.wrap(getter.getDecimal(ordinal, d.precision, d.scale).toString.getBytes)
--- End diff --
`toString.getBytes`? that would be pretty slow, how does avro store decimal physically?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22037
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org