You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2021/12/30 03:36:00 UTC
[jira] [Resolved] (SPARK-37752) Python UDF fails when it should not get evaluated
[ https://issues.apache.org/jira/browse/SPARK-37752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-37752.
----------------------------------
Resolution: Not A Bug
> Python UDF fails when it should not get evaluated
> -------------------------------------------------
>
> Key: SPARK-37752
> URL: https://issues.apache.org/jira/browse/SPARK-37752
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.3.4
> Reporter: Ohad Raviv
> Priority: Minor
>
> Haven't checked on newer versions yet.
> If i define in Python:
> {code:java}
> def udf1(col1):
> print(col1[2])
> return "blah"
> spark.udf.register("udf1", udf1) {code}
> and then use it in SQL:
> {code:java}
> select case when length(c)>2 then udf1(c) end
> from (
> select explode(array("123","234","12")) as c
> ) {code}
> it fails on:
> {noformat}
> File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 253, in main
> process()
> File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 248, in process
> serializer.dump_stream(func(split_index, iterator), outfile)
> File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 155, in <lambda>
> func = lambda _, it: map(mapper, it)
> File "<string>", line 1, in <lambda>
> File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 76, in <lambda>
> return lambda *a: f(*a)
> File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/util.py", line 55, in wrapper
> return f(*args, **kwargs)
> File "<stdin>", line 3, in udf1
> IndexError: string index out of range{noformat}
> Although in the out-of-range row it should not get evaluated at all as the case-when filters for lengths of more than 2 letters.
> the same scenario works great when we define instead a Scala UDF.
> will check now if it happens also for newer versions.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org