You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dilipbiswal <gi...@git.apache.org> on 2018/10/03 00:55:02 UTC

[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

GitHub user dilipbiswal opened a pull request:

    https://github.com/apache/spark/pull/22619

    [SQL][MINOR] Make use of TypeCoercion.findTightestCommonType while inferring CSV schema.

    ## What changes were proposed in this pull request?
    Current the CSV's infer schema code inlines `TypeCoercion.findTightestCommonType`. This is a minor refactor to make use of the common type coercion code when applicable.  This way we can take advantage of any improvement to the base method.
    
    Thanks to @MaxGekk for finding this while reviewing another PR.
    
    ## How was this patch tested?
    This is a minor refactor.  Existing tests are used to verify the change.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dilipbiswal/spark csv_minor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22619.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22619
    
----
commit d4e0bdb5a06f59ff1df3933cb43804e71cde8259
Author: Dilip Biswal <db...@...>
Date:   2018-10-03T00:47:42Z

    Make use of TypeCoercion.findTightestCommonType while inferring CSV schema

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96890/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    @HyukjinKwon Okay.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    @HyukjinKwon Does this look okay now ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3659/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22619#discussion_r222197363
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
         StringType
       }
     
    -  private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
    -
       /**
    -   * Copied from internal Spark api
    -   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
    +   * Returns the common data type given two input data types so that the return type
    +   * is compatible with both input data types.
        */
    -  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
    -    case (t1, t2) if t1 == t2 => Some(t1)
    -    case (NullType, t1) => Some(t1)
    -    case (t1, NullType) => Some(t1)
    -    case (StringType, t2) => Some(StringType)
    -    case (t1, StringType) => Some(StringType)
    -
    -    // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
    -    case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
    -      val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
    -      Some(numericPrecedence(index))
    -
    -    // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
    -    case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
    -      Some(t2)
    -    case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
    -      Some(t1)
    -
    -    // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
    -    case (t1: IntegralType, t2: DecimalType) =>
    -      findTightestCommonType(DecimalType.forType(t1), t2)
    -    case (t1: DecimalType, t2: IntegralType) =>
    -      findTightestCommonType(t1, DecimalType.forType(t2))
    -
    -    // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
    -    // in most case, also have better precision.
    -    case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
    -      Some(DoubleType)
    -
    -    case (t1: DecimalType, t2: DecimalType) =>
    -      val scale = math.max(t1.scale, t2.scale)
    -      val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
    -      if (range + scale > 38) {
    -        // DecimalType can't support precision > 38
    -        Some(DoubleType)
    -      } else {
    -        Some(DecimalType(range + scale, scale))
    +  def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
    +    TypeCoercion.findTightestCommonType(t1, t2).orElse {
    +      (t1, t2) match {
    --- End diff --
    
    BTW, let's keep the comments in the original place.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22619#discussion_r222200750
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
         StringType
       }
     
    -  private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
    -
       /**
    -   * Copied from internal Spark api
    -   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
    +   * Returns the common data type given two input data types so that the return type
    +   * is compatible with both input data types.
        */
    -  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
    -    case (t1, t2) if t1 == t2 => Some(t1)
    -    case (NullType, t1) => Some(t1)
    -    case (t1, NullType) => Some(t1)
    -    case (StringType, t2) => Some(StringType)
    -    case (t1, StringType) => Some(StringType)
    -
    -    // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
    -    case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
    -      val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
    -      Some(numericPrecedence(index))
    -
    -    // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
    -    case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
    -      Some(t2)
    -    case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
    -      Some(t1)
    -
    -    // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
    -    case (t1: IntegralType, t2: DecimalType) =>
    -      findTightestCommonType(DecimalType.forType(t1), t2)
    -    case (t1: DecimalType, t2: IntegralType) =>
    -      findTightestCommonType(t1, DecimalType.forType(t2))
    -
    -    // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
    -    // in most case, also have better precision.
    -    case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
    -      Some(DoubleType)
    -
    -    case (t1: DecimalType, t2: DecimalType) =>
    -      val scale = math.max(t1.scale, t2.scale)
    -      val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
    -      if (range + scale > 38) {
    -        // DecimalType can't support precision > 38
    -        Some(DoubleType)
    -      } else {
    -        Some(DecimalType(range + scale, scale))
    +  def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
    +    TypeCoercion.findTightestCommonType(t1, t2).orElse {
    +      (t1, t2) match {
    --- End diff --
    
    not sure. maybe just `findCompatibleTypeForCSV`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    **[Test build #96881 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96881/testReport)** for PR 22619 at commit [`d4e0bdb`](https://github.com/apache/spark/commit/d4e0bdb5a06f59ff1df3933cb43804e71cde8259).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    **[Test build #96881 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96881/testReport)** for PR 22619 at commit [`d4e0bdb`](https://github.com/apache/spark/commit/d4e0bdb5a06f59ff1df3933cb43804e71cde8259).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22619#discussion_r222200361
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
         StringType
       }
     
    -  private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
    -
       /**
    -   * Copied from internal Spark api
    -   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
    +   * Returns the common data type given two input data types so that the return type
    +   * is compatible with both input data types.
        */
    -  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
    -    case (t1, t2) if t1 == t2 => Some(t1)
    -    case (NullType, t1) => Some(t1)
    -    case (t1, NullType) => Some(t1)
    -    case (StringType, t2) => Some(StringType)
    -    case (t1, StringType) => Some(StringType)
    -
    -    // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
    -    case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
    -      val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
    -      Some(numericPrecedence(index))
    -
    -    // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
    -    case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
    -      Some(t2)
    -    case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
    -      Some(t1)
    -
    -    // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
    -    case (t1: IntegralType, t2: DecimalType) =>
    -      findTightestCommonType(DecimalType.forType(t1), t2)
    -    case (t1: DecimalType, t2: IntegralType) =>
    -      findTightestCommonType(t1, DecimalType.forType(t2))
    -
    -    // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
    -    // in most case, also have better precision.
    -    case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
    -      Some(DoubleType)
    -
    -    case (t1: DecimalType, t2: DecimalType) =>
    -      val scale = math.max(t1.scale, t2.scale)
    -      val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
    -      if (range + scale > 38) {
    -        // DecimalType can't support precision > 38
    -        Some(DoubleType)
    -      } else {
    -        Some(DecimalType(range + scale, scale))
    +  def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
    --- End diff --
    
    `compatibleType` is also fine if it is consistent with `JsonInferSchema`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3644/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22619#discussion_r222195632
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
         StringType
       }
     
    -  private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
    -
       /**
    -   * Copied from internal Spark api
    -   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
    +   * Returns the common data type given two input data types so that the return type
    +   * is compatible with both input data types.
        */
    -  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
    -    case (t1, t2) if t1 == t2 => Some(t1)
    -    case (NullType, t1) => Some(t1)
    -    case (t1, NullType) => Some(t1)
    -    case (StringType, t2) => Some(StringType)
    -    case (t1, StringType) => Some(StringType)
    -
    -    // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
    -    case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
    -      val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
    -      Some(numericPrecedence(index))
    -
    -    // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
    -    case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
    -      Some(t2)
    -    case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
    -      Some(t1)
    -
    -    // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
    -    case (t1: IntegralType, t2: DecimalType) =>
    -      findTightestCommonType(DecimalType.forType(t1), t2)
    -    case (t1: DecimalType, t2: IntegralType) =>
    -      findTightestCommonType(t1, DecimalType.forType(t2))
    -
    -    // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
    -    // in most case, also have better precision.
    -    case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
    -      Some(DoubleType)
    -
    -    case (t1: DecimalType, t2: DecimalType) =>
    -      val scale = math.max(t1.scale, t2.scale)
    -      val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
    -      if (range + scale > 38) {
    -        // DecimalType can't support precision > 38
    -        Some(DoubleType)
    -      } else {
    -        Some(DecimalType(range + scale, scale))
    +  def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
    +    TypeCoercion.findTightestCommonType(t1, t2).orElse {
    +      (t1, t2) match {
    +        case (StringType, t2) => Some(StringType)
    +        case (t1, StringType) => Some(StringType)
    +
    +        case (t1: IntegralType, t2: DecimalType) =>
    +          compatibleType(DecimalType.forType(t1), t2)
    +        case (t1: DecimalType, t2: IntegralType) =>
    +          compatibleType(t1, DecimalType.forType(t2))
    +
    +        case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
    +          Some(DoubleType)
    +
    +        case (t1: DecimalType, t2: DecimalType) =>
    +          val scale = math.max(t1.scale, t2.scale)
    +          val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
    +          if (range + scale > 38) {
    +            // DecimalType can't support precision > 38
    +            Some(DoubleType)
    +          } else {
    +            Some(DecimalType(range + scale, scale))
    +          }
    +        case (_, _) => None
    --- End diff --
    
    @HyukjinKwon Thanks. Will change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    **[Test build #96890 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96890/testReport)** for PR 22619 at commit [`ad69a1b`](https://github.com/apache/spark/commit/ad69a1ba90530cca51b650e67eb060b689c51e1d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercio...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22619#discussion_r222231401
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -172,35 +172,27 @@ private[csv] object CSVInferSchema {
         StringType
       }
     
    -  private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
    +  /**
    +   * Returns the common data type given two input data types so that the return type
    +   * is compatible with both input data types.
    +   */
    +  private def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
    +    TypeCoercion.findTightestCommonType(t1, t2).orElse (findCompatibleTypeForCSV(t1, t2))
    --- End diff --
    
    nit: `e (` -> `e(`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    **[Test build #96897 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96897/testReport)** for PR 22619 at commit [`9e656a8`](https://github.com/apache/spark/commit/9e656a863df6878ab121371e1d820762f221bbe4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    Yup. Let me leave this open few more days in case.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22619#discussion_r222197242
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
         StringType
       }
     
    -  private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
    -
       /**
    -   * Copied from internal Spark api
    -   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
    +   * Returns the common data type given two input data types so that the return type
    +   * is compatible with both input data types.
        */
    -  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
    -    case (t1, t2) if t1 == t2 => Some(t1)
    -    case (NullType, t1) => Some(t1)
    -    case (t1, NullType) => Some(t1)
    -    case (StringType, t2) => Some(StringType)
    -    case (t1, StringType) => Some(StringType)
    -
    -    // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
    -    case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
    -      val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
    -      Some(numericPrecedence(index))
    -
    -    // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
    -    case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
    -      Some(t2)
    -    case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
    -      Some(t1)
    -
    -    // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
    -    case (t1: IntegralType, t2: DecimalType) =>
    -      findTightestCommonType(DecimalType.forType(t1), t2)
    -    case (t1: DecimalType, t2: IntegralType) =>
    -      findTightestCommonType(t1, DecimalType.forType(t2))
    -
    -    // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
    -    // in most case, also have better precision.
    -    case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
    -      Some(DoubleType)
    -
    -    case (t1: DecimalType, t2: DecimalType) =>
    -      val scale = math.max(t1.scale, t2.scale)
    -      val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
    -      if (range + scale > 38) {
    -        // DecimalType can't support precision > 38
    -        Some(DoubleType)
    -      } else {
    -        Some(DecimalType(range + scale, scale))
    +  def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
    +    TypeCoercion.findTightestCommonType(t1, t2).orElse {
    +      (t1, t2) match {
    --- End diff --
    
    Can we leave this out as a private val like the previous and leave a comment that this pattern matching is CSV specific? That will reduce the diff and makes the review easier.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    @HyukjinKwon Sure :-)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22619#discussion_r222198784
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
         StringType
       }
     
    -  private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
    -
       /**
    -   * Copied from internal Spark api
    -   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
    +   * Returns the common data type given two input data types so that the return type
    +   * is compatible with both input data types.
        */
    -  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
    -    case (t1, t2) if t1 == t2 => Some(t1)
    -    case (NullType, t1) => Some(t1)
    -    case (t1, NullType) => Some(t1)
    -    case (StringType, t2) => Some(StringType)
    -    case (t1, StringType) => Some(StringType)
    -
    -    // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
    -    case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
    -      val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
    -      Some(numericPrecedence(index))
    -
    -    // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
    -    case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
    -      Some(t2)
    -    case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
    -      Some(t1)
    -
    -    // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
    -    case (t1: IntegralType, t2: DecimalType) =>
    -      findTightestCommonType(DecimalType.forType(t1), t2)
    -    case (t1: DecimalType, t2: IntegralType) =>
    -      findTightestCommonType(t1, DecimalType.forType(t2))
    -
    -    // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
    -    // in most case, also have better precision.
    -    case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
    -      Some(DoubleType)
    -
    -    case (t1: DecimalType, t2: DecimalType) =>
    -      val scale = math.max(t1.scale, t2.scale)
    -      val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
    -      if (range + scale > 38) {
    -        // DecimalType can't support precision > 38
    -        Some(DoubleType)
    -      } else {
    -        Some(DecimalType(range + scale, scale))
    +  def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
    --- End diff --
    
    @viirya i kept the same name used in JsonInferSchema. Change that as well ? Or only change this ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    **[Test build #96890 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96890/testReport)** for PR 22619 at commit [`ad69a1b`](https://github.com/apache/spark/commit/ad69a1ba90530cca51b650e67eb060b689c51e1d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    Looks okay - I checked a case one by one but it needs another look.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    Maybe this is related to #22448.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    @gatorsmile There should not be any behaviour change. I was thinking that existing test cases should suffice. Basically we used to duplicate the code of TypeCoercion.findTightestCommonType in here. Here i am just reusing the common function. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22619#discussion_r222194957
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
         StringType
       }
     
    -  private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
    -
       /**
    -   * Copied from internal Spark api
    -   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
    +   * Returns the common data type given two input data types so that the return type
    +   * is compatible with both input data types.
        */
    -  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
    -    case (t1, t2) if t1 == t2 => Some(t1)
    -    case (NullType, t1) => Some(t1)
    -    case (t1, NullType) => Some(t1)
    -    case (StringType, t2) => Some(StringType)
    -    case (t1, StringType) => Some(StringType)
    -
    -    // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
    -    case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
    -      val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
    -      Some(numericPrecedence(index))
    -
    -    // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
    -    case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
    -      Some(t2)
    -    case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
    -      Some(t1)
    -
    -    // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
    -    case (t1: IntegralType, t2: DecimalType) =>
    -      findTightestCommonType(DecimalType.forType(t1), t2)
    -    case (t1: DecimalType, t2: IntegralType) =>
    -      findTightestCommonType(t1, DecimalType.forType(t2))
    -
    -    // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
    -    // in most case, also have better precision.
    -    case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
    -      Some(DoubleType)
    -
    -    case (t1: DecimalType, t2: DecimalType) =>
    -      val scale = math.max(t1.scale, t2.scale)
    -      val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
    -      if (range + scale > 38) {
    -        // DecimalType can't support precision > 38
    -        Some(DoubleType)
    -      } else {
    -        Some(DecimalType(range + scale, scale))
    +  def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
    +    TypeCoercion.findTightestCommonType(t1, t2).orElse {
    +      (t1, t2) match {
    +        case (StringType, t2) => Some(StringType)
    +        case (t1, StringType) => Some(StringType)
    +
    +        case (t1: IntegralType, t2: DecimalType) =>
    +          compatibleType(DecimalType.forType(t1), t2)
    +        case (t1: DecimalType, t2: IntegralType) =>
    +          compatibleType(t1, DecimalType.forType(t2))
    +
    +        case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
    +          Some(DoubleType)
    +
    +        case (t1: DecimalType, t2: DecimalType) =>
    +          val scale = math.max(t1.scale, t2.scale)
    +          val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
    +          if (range + scale > 38) {
    +            // DecimalType can't support precision > 38
    +            Some(DoubleType)
    +          } else {
    +            Some(DecimalType(range + scale, scale))
    +          }
    +        case (_, _) => None
    --- End diff --
    
    `case _ => None`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercio...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22619#discussion_r222231313
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -216,7 +208,7 @@ private[csv] object CSVInferSchema {
           } else {
             Some(DecimalType(range + scale, scale))
           }
    -
         case _ => None
       }
    +
    --- End diff --
    
    Let's get rid of new lines changes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22619#discussion_r222198383
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
         StringType
       }
     
    -  private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
    -
       /**
    -   * Copied from internal Spark api
    -   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
    +   * Returns the common data type given two input data types so that the return type
    +   * is compatible with both input data types.
        */
    -  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
    -    case (t1, t2) if t1 == t2 => Some(t1)
    -    case (NullType, t1) => Some(t1)
    -    case (t1, NullType) => Some(t1)
    -    case (StringType, t2) => Some(StringType)
    -    case (t1, StringType) => Some(StringType)
    -
    -    // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
    -    case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
    -      val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
    -      Some(numericPrecedence(index))
    -
    -    // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
    -    case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
    -      Some(t2)
    -    case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
    -      Some(t1)
    -
    -    // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
    -    case (t1: IntegralType, t2: DecimalType) =>
    -      findTightestCommonType(DecimalType.forType(t1), t2)
    -    case (t1: DecimalType, t2: IntegralType) =>
    -      findTightestCommonType(t1, DecimalType.forType(t2))
    -
    -    // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
    -    // in most case, also have better precision.
    -    case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
    -      Some(DoubleType)
    -
    -    case (t1: DecimalType, t2: DecimalType) =>
    -      val scale = math.max(t1.scale, t2.scale)
    -      val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
    -      if (range + scale > 38) {
    -        // DecimalType can't support precision > 38
    -        Some(DoubleType)
    -      } else {
    -        Some(DecimalType(range + scale, scale))
    +  def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
    --- End diff --
    
    nit: `findCompatibleType`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    cc @HyukjinKwon @MaxGekk 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22619#discussion_r222199535
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
         StringType
       }
     
    -  private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
    -
       /**
    -   * Copied from internal Spark api
    -   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
    +   * Returns the common data type given two input data types so that the return type
    +   * is compatible with both input data types.
        */
    -  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
    -    case (t1, t2) if t1 == t2 => Some(t1)
    -    case (NullType, t1) => Some(t1)
    -    case (t1, NullType) => Some(t1)
    -    case (StringType, t2) => Some(StringType)
    -    case (t1, StringType) => Some(StringType)
    -
    -    // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
    -    case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
    -      val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
    -      Some(numericPrecedence(index))
    -
    -    // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
    -    case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
    -      Some(t2)
    -    case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
    -      Some(t1)
    -
    -    // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
    -    case (t1: IntegralType, t2: DecimalType) =>
    -      findTightestCommonType(DecimalType.forType(t1), t2)
    -    case (t1: DecimalType, t2: IntegralType) =>
    -      findTightestCommonType(t1, DecimalType.forType(t2))
    -
    -    // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
    -    // in most case, also have better precision.
    -    case (DoubleType, _: DecimalType) | (_: DecimalType, DoubleType) =>
    -      Some(DoubleType)
    -
    -    case (t1: DecimalType, t2: DecimalType) =>
    -      val scale = math.max(t1.scale, t2.scale)
    -      val range = math.max(t1.precision - t1.scale, t2.precision - t2.scale)
    -      if (range + scale > 38) {
    -        // DecimalType can't support precision > 38
    -        Some(DoubleType)
    -      } else {
    -        Some(DecimalType(range + scale, scale))
    +  def compatibleType(t1: DataType, t2: DataType): Option[DataType] = {
    +    TypeCoercion.findTightestCommonType(t1, t2).orElse {
    +      (t1, t2) match {
    --- End diff --
    
    @HyukjinKwon Did you have any preference or suggestion on the name of the val ? findCommonTypeExtended ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96881/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercio...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22619


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22619#discussion_r222198257
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
         StringType
       }
     
    -  private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
    -
       /**
    -   * Copied from internal Spark api
    -   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
    +   * Returns the common data type given two input data types so that the return type
    +   * is compatible with both input data types.
        */
    -  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
    -    case (t1, t2) if t1 == t2 => Some(t1)
    -    case (NullType, t1) => Some(t1)
    -    case (t1, NullType) => Some(t1)
    -    case (StringType, t2) => Some(StringType)
    -    case (t1, StringType) => Some(StringType)
    -
    -    // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
    -    case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
    -      val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
    -      Some(numericPrecedence(index))
    -
    -    // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
    -    case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
    -      Some(t2)
    -    case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
    -      Some(t1)
    -
    -    // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
    -    case (t1: IntegralType, t2: DecimalType) =>
    -      findTightestCommonType(DecimalType.forType(t1), t2)
    -    case (t1: DecimalType, t2: IntegralType) =>
    -      findTightestCommonType(t1, DecimalType.forType(t2))
    -
    -    // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
    -    // in most case, also have better precision.
    --- End diff --
    
    Some comments here are ignored in the change. Shall we keep them?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    **[Test build #96897 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96897/testReport)** for PR 22619 at commit [`9e656a8`](https://github.com/apache/spark/commit/9e656a863df6878ab121371e1d820762f221bbe4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3652/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    @ueshin 
    > Maybe this is related to #22448.
    Yeah.. Actually @MaxGekk had pointed me to the presence of duplicate code in one of his comment. I was trying to address it in here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SPARK-25600][SQL][MINOR] Make use of TypeCoercion.findT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96897/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    Any behavior change? Test cases?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22619: [SQL][MINOR] Make use of TypeCoercion.findTightestCommon...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22619
  
    Let's just file a JIRA @dilipbiswal BTW.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22619: [SQL][MINOR] Make use of TypeCoercion.findTightes...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22619#discussion_r222198591
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -172,51 +172,35 @@ private[csv] object CSVInferSchema {
         StringType
       }
     
    -  private val numericPrecedence: IndexedSeq[DataType] = TypeCoercion.numericPrecedence
    -
       /**
    -   * Copied from internal Spark api
    -   * [[org.apache.spark.sql.catalyst.analysis.TypeCoercion]]
    +   * Returns the common data type given two input data types so that the return type
    +   * is compatible with both input data types.
        */
    -  val findTightestCommonType: (DataType, DataType) => Option[DataType] = {
    -    case (t1, t2) if t1 == t2 => Some(t1)
    -    case (NullType, t1) => Some(t1)
    -    case (t1, NullType) => Some(t1)
    -    case (StringType, t2) => Some(StringType)
    -    case (t1, StringType) => Some(StringType)
    -
    -    // Promote numeric types to the highest of the two and all numeric types to unlimited decimal
    -    case (t1, t2) if Seq(t1, t2).forall(numericPrecedence.contains) =>
    -      val index = numericPrecedence.lastIndexWhere(t => t == t1 || t == t2)
    -      Some(numericPrecedence(index))
    -
    -    // These two cases below deal with when `DecimalType` is larger than `IntegralType`.
    -    case (t1: IntegralType, t2: DecimalType) if t2.isWiderThan(t1) =>
    -      Some(t2)
    -    case (t1: DecimalType, t2: IntegralType) if t1.isWiderThan(t2) =>
    -      Some(t1)
    -
    -    // These two cases below deal with when `IntegralType` is larger than `DecimalType`.
    -    case (t1: IntegralType, t2: DecimalType) =>
    -      findTightestCommonType(DecimalType.forType(t1), t2)
    -    case (t1: DecimalType, t2: IntegralType) =>
    -      findTightestCommonType(t1, DecimalType.forType(t2))
    -
    -    // Double support larger range than fixed decimal, DecimalType.Maximum should be enough
    -    // in most case, also have better precision.
    --- End diff --
    
    @viirya Yeah.. we should keep.. sorry.. got dropped inadvertently.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org