You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Andre Sa de Mello (JIRA)" <ji...@apache.org> on 2019/02/23 15:16:00 UTC

[jira] [Created] (SPARK-26979) Some SQL functions do not take column names

Andre Sa de Mello created SPARK-26979:
-----------------------------------------

             Summary: Some SQL functions do not take column names
                 Key: SPARK-26979
                 URL: https://issues.apache.org/jira/browse/SPARK-26979
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Andre Sa de Mello


Most SQL functions defined in _org.apache.spark.sql.functions_ have two variations, one taking a Column object as input, and another taking a string representing a column name, which is then converted into a Column object internally.

There are, however, a few notable exceptions:
 * lower()
 * upper()
 * abs()
 * bitwiseNOT()

While this doesn't break anything, as you can easily create a Column object yourself prior to passing it to one of these functions, it has two undesirable consequences:
 # It is surprising - it breaks coder's expectations when they are first starting with Spark. Every API should be as consistent as possible, so as to make the learning curve smoother and to reduce causes for human error;
 # It gets in the way of stylistic conventions. Most of the time it makes Python/Scala/Java code more readable to use literal names, and the API provides ample support for that, but these few exceptions prevent this pattern from being universally applicable.

This is a very easy fix, and I see no reason not to apply it. I have a PR ready.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org