You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dilipbiswal <gi...@git.apache.org> on 2018/10/03 00:55:02 UTC
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
GitHub user dilipbiswal opened a pull request:
https://github.com/apache/spark/pull/22619
[SQL][MINOR] Make use of TypeCoercion.findTightestCommonType while inferring CSV schema.
## What changes were proposed in this pull request?
Current the CSV's infer schema code inlines `TypeCoercion.findTightestCommonType`. This is a minor refactor to make use of the common type coercion code when applicable. This way we can take advantage of any improvement to the base method.
Thanks to @MaxGekk for finding this while reviewing another PR.
## How was this patch tested?
This is a minor refactor. Existing tests are used to verify the change.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dilipbiswal/spark csv_minor
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22619.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22619
----
commit d4e0bdb5a06f59ff1df3933cb43804e71cde8259
Author: Dilip Biswal <db...@...>
Date: 2018-10-03T00:47:42Z
Make use of TypeCoercion.findTightestCommonType while inferring CSV schema
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22619
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96890/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:
https://github.com/apache/spark/pull/22619
@HyukjinKwon Okay.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:
https://github.com/apache/spark/pull/22619
@HyukjinKwon Does this look okay now ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22619
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22619
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3659/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22619
Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22619
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22619#discussion_r222197363
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
StringType
}
- private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
-
/**
- * Copied from internal Spark api
- * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+ * Returns the common data type given two input data types so that the return type
+ * is compatible with both input data types.
*/
- val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
- case (t1, t2) if t1 == t2 => Some(t1)
- case (NullType, t1) => Some(t1)
- case (t1, NullType) => Some(t1)
- case (StringType, t2) => Some(StringType)
- case (t1, StringType) => Some(StringType)
-
- // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
- case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
- val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
- Some(numericPrecedence(index))
-
- // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
- case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
- Some(t2)
- case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
- Some(t1)
-
- // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
- case (t1: IntegralType, t2: DecimalType) =>
- findTightestCommonType(DecimalType.forType(t1), t2)
- case (t1: DecimalType, t2: IntegralType) =>
- findTightestCommonType(t1, DecimalType.forType(t2))
-
- // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
- // in most case, also have better precision.
- case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
- Some(DoubleType)
-
- case (t1: DecimalType, t2: DecimalType) =>
- val scale = math.max(t1.scale, t2.scale)
- val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
- if (range + scale > 38) {
- // DecimalType can't support precision > 38
- Some(DoubleType)
- } else {
- Some(DecimalType(range + scale, scale))
+ def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
+ TypeCoercion.findTightestCommonType(t1, t2).orElse {
+ (t1, t2) match {
--- End diff --
BTW, let's keep the comments in the original place.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22619#discussion_r222200750
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
StringType
}
- private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
-
/**
- * Copied from internal Spark api
- * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+ * Returns the common data type given two input data types so that the return type
+ * is compatible with both input data types.
*/
- val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
- case (t1, t2) if t1 == t2 => Some(t1)
- case (NullType, t1) => Some(t1)
- case (t1, NullType) => Some(t1)
- case (StringType, t2) => Some(StringType)
- case (t1, StringType) => Some(StringType)
-
- // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
- case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
- val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
- Some(numericPrecedence(index))
-
- // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
- case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
- Some(t2)
- case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
- Some(t1)
-
- // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
- case (t1: IntegralType, t2: DecimalType) =>
- findTightestCommonType(DecimalType.forType(t1), t2)
- case (t1: DecimalType, t2: IntegralType) =>
- findTightestCommonType(t1, DecimalType.forType(t2))
-
- // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
- // in most case, also have better precision.
- case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
- Some(DoubleType)
-
- case (t1: DecimalType, t2: DecimalType) =>
- val scale = math.max(t1.scale, t2.scale)
- val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
- if (range + scale > 38) {
- // DecimalType can't support precision > 38
- Some(DoubleType)
- } else {
- Some(DecimalType(range + scale, scale))
+ def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
+ TypeCoercion.findTightestCommonType(t1, t2).orElse {
+ (t1, t2) match {
--- End diff --
not sure. maybe just `findCompatibleTypeForCSV`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22619
**[Test build #96881 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96881/testReport)** for PR 22619 at commit [`d4e0bdb`](https://github.com/apache/spark/commit/d4e0bdb5a06f59ff1df3933cb43804e71cde8259).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22619
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22619
**[Test build #96881 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96881/testReport)** for PR 22619 at commit [`d4e0bdb`](https://github.com/apache/spark/commit/d4e0bdb5a06f59ff1df3933cb43804e71cde8259).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/22619#discussion_r222200361
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
StringType
}
- private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
-
/**
- * Copied from internal Spark api
- * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+ * Returns the common data type given two input data types so that the return type
+ * is compatible with both input data types.
*/
- val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
- case (t1, t2) if t1 == t2 => Some(t1)
- case (NullType, t1) => Some(t1)
- case (t1, NullType) => Some(t1)
- case (StringType, t2) => Some(StringType)
- case (t1, StringType) => Some(StringType)
-
- // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
- case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
- val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
- Some(numericPrecedence(index))
-
- // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
- case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
- Some(t2)
- case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
- Some(t1)
-
- // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
- case (t1: IntegralType, t2: DecimalType) =>
- findTightestCommonType(DecimalType.forType(t1), t2)
- case (t1: DecimalType, t2: IntegralType) =>
- findTightestCommonType(t1, DecimalType.forType(t2))
-
- // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
- // in most case, also have better precision.
- case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
- Some(DoubleType)
-
- case (t1: DecimalType, t2: DecimalType) =>
- val scale = math.max(t1.scale, t2.scale)
- val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
- if (range + scale > 38) {
- // DecimalType can't support precision > 38
- Some(DoubleType)
- } else {
- Some(DecimalType(range + scale, scale))
+ def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
--- End diff --
`compatibleType` is also fine if it is consistent with `JsonInferSchema`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22619
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3644/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:
https://github.com/apache/spark/pull/22619#discussion_r222195632
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
StringType
}
- private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
-
/**
- * Copied from internal Spark api
- * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+ * Returns the common data type given two input data types so that the return type
+ * is compatible with both input data types.
*/
- val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
- case (t1, t2) if t1 == t2 => Some(t1)
- case (NullType, t1) => Some(t1)
- case (t1, NullType) => Some(t1)
- case (StringType, t2) => Some(StringType)
- case (t1, StringType) => Some(StringType)
-
- // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
- case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
- val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
- Some(numericPrecedence(index))
-
- // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
- case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
- Some(t2)
- case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
- Some(t1)
-
- // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
- case (t1: IntegralType, t2: DecimalType) =>
- findTightestCommonType(DecimalType.forType(t1), t2)
- case (t1: DecimalType, t2: IntegralType) =>
- findTightestCommonType(t1, DecimalType.forType(t2))
-
- // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
- // in most case, also have better precision.
- case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
- Some(DoubleType)
-
- case (t1: DecimalType, t2: DecimalType) =>
- val scale = math.max(t1.scale, t2.scale)
- val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
- if (range + scale > 38) {
- // DecimalType can't support precision > 38
- Some(DoubleType)
- } else {
- Some(DecimalType(range + scale, scale))
+ def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
+ TypeCoercion.findTightestCommonType(t1, t2).orElse {
+ (t1, t2) match {
+ case (StringType, t2) => Some(StringType)
+ case (t1, StringType) => Some(StringType)
+
+ case (t1: IntegralType, t2: DecimalType) =>
+ compatibleType(DecimalType.forType(t1), t2)
+ case (t1: DecimalType, t2: IntegralType) =>
+ compatibleType(t1, DecimalType.forType(t2))
+
+ case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
+ Some(DoubleType)
+
+ case (t1: DecimalType, t2: DecimalType) =>
+ val scale = math.max(t1.scale, t2.scale)
+ val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
+ if (range + scale > 38) {
+ // DecimalType can't support precision > 38
+ Some(DoubleType)
+ } else {
+ Some(DecimalType(range + scale, scale))
+ }
+ case (_, _) => None
--- End diff --
@HyukjinKwon Thanks. Will change.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22619
**[Test build #96890 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96890/testReport)** for PR 22619 at commit [`ad69a1b`](https://github.com/apache/spark/commit/ad69a1ba90530cca51b650e67eb060b689c51e1d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercio...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22619#discussion_r222231401
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
@@ -172,35 +172,27 @@ private[csv] object CSVInferSchema {
StringType
}
- private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
+ /**
+ * Returns the common data type given two input data types so that the return type
+ * is compatible with both input data types.
+ */
+ private def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
+ TypeCoercion.findTightestCommonType(t1, t2).orElse (findCompatibleTypeForCSV(t1, t2))
--- End diff --
nit: `e (` -> `e(`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22619
**[Test build #96897 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96897/testReport)** for PR 22619 at commit [`9e656a8`](https://github.com/apache/spark/commit/9e656a863df6878ab121371e1d820762f221bbe4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22619
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22619
Yup. Let me leave this open few more days in case.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22619#discussion_r222197242
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
StringType
}
- private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
-
/**
- * Copied from internal Spark api
- * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+ * Returns the common data type given two input data types so that the return type
+ * is compatible with both input data types.
*/
- val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
- case (t1, t2) if t1 == t2 => Some(t1)
- case (NullType, t1) => Some(t1)
- case (t1, NullType) => Some(t1)
- case (StringType, t2) => Some(StringType)
- case (t1, StringType) => Some(StringType)
-
- // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
- case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
- val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
- Some(numericPrecedence(index))
-
- // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
- case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
- Some(t2)
- case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
- Some(t1)
-
- // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
- case (t1: IntegralType, t2: DecimalType) =>
- findTightestCommonType(DecimalType.forType(t1), t2)
- case (t1: DecimalType, t2: IntegralType) =>
- findTightestCommonType(t1, DecimalType.forType(t2))
-
- // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
- // in most case, also have better precision.
- case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
- Some(DoubleType)
-
- case (t1: DecimalType, t2: DecimalType) =>
- val scale = math.max(t1.scale, t2.scale)
- val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
- if (range + scale > 38) {
- // DecimalType can't support precision > 38
- Some(DoubleType)
- } else {
- Some(DecimalType(range + scale, scale))
+ def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
+ TypeCoercion.findTightestCommonType(t1, t2).orElse {
+ (t1, t2) match {
--- End diff --
Can we leave this out as a private val like the previous and leave a comment that this pattern matching is CSV specific? That will reduce the diff and makes the review easier.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22619
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:
https://github.com/apache/spark/pull/22619
@HyukjinKwon Sure :-)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22619
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:
https://github.com/apache/spark/pull/22619#discussion_r222198784
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
StringType
}
- private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
-
/**
- * Copied from internal Spark api
- * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+ * Returns the common data type given two input data types so that the return type
+ * is compatible with both input data types.
*/
- val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
- case (t1, t2) if t1 == t2 => Some(t1)
- case (NullType, t1) => Some(t1)
- case (t1, NullType) => Some(t1)
- case (StringType, t2) => Some(StringType)
- case (t1, StringType) => Some(StringType)
-
- // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
- case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
- val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
- Some(numericPrecedence(index))
-
- // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
- case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
- Some(t2)
- case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
- Some(t1)
-
- // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
- case (t1: IntegralType, t2: DecimalType) =>
- findTightestCommonType(DecimalType.forType(t1), t2)
- case (t1: DecimalType, t2: IntegralType) =>
- findTightestCommonType(t1, DecimalType.forType(t2))
-
- // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
- // in most case, also have better precision.
- case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
- Some(DoubleType)
-
- case (t1: DecimalType, t2: DecimalType) =>
- val scale = math.max(t1.scale, t2.scale)
- val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
- if (range + scale > 38) {
- // DecimalType can't support precision > 38
- Some(DoubleType)
- } else {
- Some(DecimalType(range + scale, scale))
+ def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
--- End diff --
@viirya i kept the same name used in JsonInferSchema. Change that as well ? Or only change this ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22619
**[Test build #96890 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96890/testReport)** for PR 22619 at commit [`ad69a1b`](https://github.com/apache/spark/commit/ad69a1ba90530cca51b650e67eb060b689c51e1d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22619
Looks okay - I checked a case one by one but it needs another look.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/22619
Maybe this is related to #22448.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:
https://github.com/apache/spark/pull/22619
@gatorsmile There should not be any behaviour change. I was thinking that existing test cases should suffice. Basically we used to duplicate the code of TypeCoercion.findTightestCommonType in here. Here i am just reusing the common function.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22619#discussion_r222194957
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
StringType
}
- private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
-
/**
- * Copied from internal Spark api
- * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+ * Returns the common data type given two input data types so that the return type
+ * is compatible with both input data types.
*/
- val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
- case (t1, t2) if t1 == t2 => Some(t1)
- case (NullType, t1) => Some(t1)
- case (t1, NullType) => Some(t1)
- case (StringType, t2) => Some(StringType)
- case (t1, StringType) => Some(StringType)
-
- // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
- case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
- val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
- Some(numericPrecedence(index))
-
- // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
- case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
- Some(t2)
- case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
- Some(t1)
-
- // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
- case (t1: IntegralType, t2: DecimalType) =>
- findTightestCommonType(DecimalType.forType(t1), t2)
- case (t1: DecimalType, t2: IntegralType) =>
- findTightestCommonType(t1, DecimalType.forType(t2))
-
- // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
- // in most case, also have better precision.
- case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
- Some(DoubleType)
-
- case (t1: DecimalType, t2: DecimalType) =>
- val scale = math.max(t1.scale, t2.scale)
- val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
- if (range + scale > 38) {
- // DecimalType can't support precision > 38
- Some(DoubleType)
- } else {
- Some(DecimalType(range + scale, scale))
+ def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
+ TypeCoercion.findTightestCommonType(t1, t2).orElse {
+ (t1, t2) match {
+ case (StringType, t2) => Some(StringType)
+ case (t1, StringType) => Some(StringType)
+
+ case (t1: IntegralType, t2: DecimalType) =>
+ compatibleType(DecimalType.forType(t1), t2)
+ case (t1: DecimalType, t2: IntegralType) =>
+ compatibleType(t1, DecimalType.forType(t2))
+
+ case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
+ Some(DoubleType)
+
+ case (t1: DecimalType, t2: DecimalType) =>
+ val scale = math.max(t1.scale, t2.scale)
+ val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
+ if (range + scale > 38) {
+ // DecimalType can't support precision > 38
+ Some(DoubleType)
+ } else {
+ Some(DecimalType(range + scale, scale))
+ }
+ case (_, _) => None
--- End diff --
`case _ => None`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercio...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22619#discussion_r222231313
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
@@ -216,7 +208,7 @@ private[csv] object CSVInferSchema {
} else {
Some(DecimalType(range + scale, scale))
}
-
case _ => None
}
+
--- End diff --
Let's get rid of new lines changes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/22619#discussion_r222198383
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
StringType
}
- private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
-
/**
- * Copied from internal Spark api
- * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+ * Returns the common data type given two input data types so that the return type
+ * is compatible with both input data types.
*/
- val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
- case (t1, t2) if t1 == t2 => Some(t1)
- case (NullType, t1) => Some(t1)
- case (t1, NullType) => Some(t1)
- case (StringType, t2) => Some(StringType)
- case (t1, StringType) => Some(StringType)
-
- // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
- case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
- val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
- Some(numericPrecedence(index))
-
- // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
- case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
- Some(t2)
- case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
- Some(t1)
-
- // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
- case (t1: IntegralType, t2: DecimalType) =>
- findTightestCommonType(DecimalType.forType(t1), t2)
- case (t1: DecimalType, t2: IntegralType) =>
- findTightestCommonType(t1, DecimalType.forType(t2))
-
- // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
- // in most case, also have better precision.
- case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
- Some(DoubleType)
-
- case (t1: DecimalType, t2: DecimalType) =>
- val scale = math.max(t1.scale, t2.scale)
- val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
- if (range + scale > 38) {
- // DecimalType can't support precision > 38
- Some(DoubleType)
- } else {
- Some(DecimalType(range + scale, scale))
+ def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
--- End diff --
nit: `findCompatibleType`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:
https://github.com/apache/spark/pull/22619
cc @HyukjinKwon @MaxGekk
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:
https://github.com/apache/spark/pull/22619#discussion_r222199535
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
StringType
}
- private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
-
/**
- * Copied from internal Spark api
- * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+ * Returns the common data type given two input data types so that the return type
+ * is compatible with both input data types.
*/
- val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
- case (t1, t2) if t1 == t2 => Some(t1)
- case (NullType, t1) => Some(t1)
- case (t1, NullType) => Some(t1)
- case (StringType, t2) => Some(StringType)
- case (t1, StringType) => Some(StringType)
-
- // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
- case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
- val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
- Some(numericPrecedence(index))
-
- // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
- case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
- Some(t2)
- case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
- Some(t1)
-
- // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
- case (t1: IntegralType, t2: DecimalType) =>
- findTightestCommonType(DecimalType.forType(t1), t2)
- case (t1: DecimalType, t2: IntegralType) =>
- findTightestCommonType(t1, DecimalType.forType(t2))
-
- // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
- // in most case, also have better precision.
- case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
- Some(DoubleType)
-
- case (t1: DecimalType, t2: DecimalType) =>
- val scale = math.max(t1.scale, t2.scale)
- val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
- if (range + scale > 38) {
- // DecimalType can't support precision > 38
- Some(DoubleType)
- } else {
- Some(DecimalType(range + scale, scale))
+ def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
+ TypeCoercion.findTightestCommonType(t1, t2).orElse {
+ (t1, t2) match {
--- End diff --
@HyukjinKwon Did you have any preference or suggestion on the name of the val ? findCommonTypeExtended ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22619
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96881/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercio...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22619
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/22619#discussion_r222198257
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
StringType
}
- private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
-
/**
- * Copied from internal Spark api
- * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+ * Returns the common data type given two input data types so that the return type
+ * is compatible with both input data types.
*/
- val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
- case (t1, t2) if t1 == t2 => Some(t1)
- case (NullType, t1) => Some(t1)
- case (t1, NullType) => Some(t1)
- case (StringType, t2) => Some(StringType)
- case (t1, StringType) => Some(StringType)
-
- // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
- case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
- val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
- Some(numericPrecedence(index))
-
- // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
- case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
- Some(t2)
- case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
- Some(t1)
-
- // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
- case (t1: IntegralType, t2: DecimalType) =>
- findTightestCommonType(DecimalType.forType(t1), t2)
- case (t1: DecimalType, t2: IntegralType) =>
- findTightestCommonType(t1, DecimalType.forType(t2))
-
- // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
- // in most case, also have better precision.
--- End diff --
Some comments here are ignored in the change. Shall we keep them?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22619
**[Test build #96897 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96897/testReport)** for PR 22619 at commit [`9e656a8`](https://github.com/apache/spark/commit/9e656a863df6878ab121371e1d820762f221bbe4).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22619
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3652/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:
https://github.com/apache/spark/pull/22619
@ueshin
> Maybe this is related to #22448.
Yeah.. Actually @MaxGekk had pointed me to the presence of duplicate code in one of his comment. I was trying to address it in here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22619
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96897/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22619
Any behavior change? Test cases?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22619
Let's just file a JIRA @dilipbiswal BTW.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:
https://github.com/apache/spark/pull/22619#discussion_r222198591
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
@@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
StringType
}
- private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
-
/**
- * Copied from internal Spark api
- * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
+ * Returns the common data type given two input data types so that the return type
+ * is compatible with both input data types.
*/
- val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
- case (t1, t2) if t1 == t2 => Some(t1)
- case (NullType, t1) => Some(t1)
- case (t1, NullType) => Some(t1)
- case (StringType, t2) => Some(StringType)
- case (t1, StringType) => Some(StringType)
-
- // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
- case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
- val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
- Some(numericPrecedence(index))
-
- // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
- case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
- Some(t2)
- case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
- Some(t1)
-
- // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
- case (t1: IntegralType, t2: DecimalType) =>
- findTightestCommonType(DecimalType.forType(t1), t2)
- case (t1: DecimalType, t2: IntegralType) =>
- findTightestCommonType(t1, DecimalType.forType(t2))
-
- // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
- // in most case, also have better precision.
--- End diff --
@viirya Yeah.. we should keep.. sorry.. got dropped inadvertently.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org