You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/24 13:04:57 UTC

[GitHub] asmello opened a new pull request #23882: [SPARK-26979][PySpark][WIP] Add missing column name support for some SQL functions

asmello opened a new pull request #23882: [SPARK-26979][PySpark][WIP] Add missing column name support for some SQL functions
URL: https://github.com/apache/spark/pull/23882
 
 
   ## What changes were proposed in this pull request?
   
   Most SQL functions defined in `spark.sql.functions` have two calling patterns, one with a Column object as input, and another with a string representing a column name, which is then converted into a Column object internally.
   
   There are, however, a few notable exceptions:
   
   - lower()
   - upper()
   - abs()
   - bitwiseNOT()
   - ltrim()
   - rtrim()
   - trim()
   - ascii()
   - base64()
   - unbase64()
   
   While this doesn't break anything, as you can easily create a Column object yourself prior to passing it to one of these functions, it has two undesirable consequences:
   
   1. It is surprising - it breaks coder's expectations when they are first starting with Spark. Every API should be as consistent as possible, so as to make the learning curve smoother and to reduce causes for human error;
   
   2. It gets in the way of stylistic conventions. Most of the time it makes Python/Scala/Java code more readable to use literal names, and the API provides ample support for that, but these few exceptions prevent this pattern from being universally applicable.
   
   This is a very simple fix, and I see no reason not to apply it.
   
   ### Side effects
   
   This PR also fixes an issue with some functions being defined multiple times by using `_create_function()`.
   
   ## How was this patch tested?
   
   Running ./dev/run-tests and testing manually. (WIP)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org