You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ryan Bald (JIRA)" <ji...@apache.org> on 2017/08/22 23:11:00 UTC

[jira] [Created] (SPARK-21811) Inconsistency when finding the widest common type of a combination of DateType, StringType, and NumericType

Ryan Bald created SPARK-21811:
---------------------------------

             Summary: Inconsistency when finding the widest common type of a combination of DateType, StringType, and NumericType
                 Key: SPARK-21811
                 URL: https://issues.apache.org/jira/browse/SPARK-21811
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Ryan Bald
            Priority: Minor


Finding the widest common type for the arguments of a variadic function (such as IN or COALESCE) when the types of the arguments are a combination of DateType/TimestampType, StringType, and NumericType fails with an AnalysisException for some orders of the arguments and succeeds with a common type of StringType for other orders of the arguments.

The below examples used to reproduce the error assume a schema of:
{{[c1: date, c2: string, c3: int]}}

The following succeeds:
{{SELECT coalesce(c1, c2, c3) FROM table}}

While the following produces an exception:
{{SELECT coalesce(c1, c3, c2) FROM table}}

The order of arguments affects the behavior because it looks to be the widest common type is found by repeatedly looking at two arguments at a time, the widest common type found thus far and the next argument. On initial thought of a fix, I think the way the widest common type is found would have to be changed and instead look at all arguments first before deciding what the widest common type should be.

As my boss is out of office for the rest of the day I will give a pull request a shot, but as I am not super familiar with Scala or Spark's coding style guidelines, a pull request is not promised. Going forward with my attempted pull request, I will assume having DateType/TimestampType, StringType, and NumericType arguments in an IN expression and COALESCE function (and any other function/expression where this combination of argument types can occur) is valid. I find it also quite reasonable to have this combination of argument types to be invalid, so if that's what is decided, then oh well.

If I were a betting man, I'd say the fix would be made in the following file: [TypeCoercion.scala|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org