You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/03 02:28:04 UTC
[GitHub] [spark] zero323 commented on a change in pull request #27406: [SPARK-30681][PYSPARK][SQL] Add higher order functions API to PySpark

zero323 commented on a change in pull request #27406: [SPARK-30681][PYSPARK][SQL] Add higher order functions API to PySpark
URL: https://github.com/apache/spark/pull/27406#discussion_r373901264
 
 

 ##########
 File path: python/pyspark/sql/column.py
 ##########
 @@ -129,6 +129,103 @@ def _(self, other):
     return _
 
 
+def _unresolved_named_lambda_variable(*name_parts):
+    """
+    Create o.a.s.sql.expressions.UnresolvedNamedLambdaVariable and
+    convert it to o.s.sql.Column
+
+    :param name_parts: str
+    """
+    sc = SparkContext._active_spark_context
+    name_parts_seq = _to_seq(sc, name_parts)
+    expressions = sc._jvm.org.apache.spark.sql.catalyst.expressions
+    return Column(
+        sc._jvm.Column(
+            expressions.UnresolvedNamedLambdaVariable(name_parts_seq)
+        )
+    )
+
+
+def _get_lambda_parameters(f):
+    import inspect
+
+    signature = inspect.signature(f)
+    parameters = signature.parameters.values()
+
+    # We should exclude functions that use
+    # variable args and keyword argnames
+    # as well as keyword only args
+    supported_parmeter_types = {
+        inspect.Parameter.POSITIONAL_OR_KEYWORD,
+        inspect.Parameter.POSITIONAL_ONLY,
+    }
+
+    # Validate that
+    # function arity is between 1 and 3
+    if not (1 <= len(parameters) <= 3):
 
 Review comment:
   > I am good with that but if it needs some considerable codes like the current, I am not sure yet.
   
   Well, this change alone (without adding domain object that led to some line splitting) takes 7 lines of actual code  (excluding tests and docstings - 95dbda506cb3492c2e972e69be25eda4bf75b336), and that's only because I  was pretty generous with formatting. One or two liner would do just fine, if it wasn't for linter rules. Otherwise we just piggyback on signature analysis code - not something that can be delegated to the analyzer.
   
   > I am not sure if this is a good way to duplicately handle error cases.
   
   I'd argue that this doesn't really qualify as one. These specific analyzer errors target SQL expressions. Scala users will never get there, as this part will be handled by the compiler.
   
   In general I anticipate that this feature might be somewhat confusing before users get used to it ‒ so safeguarding against common mistakes is a good idea, especially when predicted maintenance overhead is negligible. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org