You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2021/12/30 03:36:00 UTC

[jira] [Resolved] (SPARK-37752) Python UDF fails when it should not get evaluated

     [ https://issues.apache.org/jira/browse/SPARK-37752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-37752.
----------------------------------
    Resolution: Not A Bug

> Python UDF fails when it should not get evaluated
> -------------------------------------------------
>
>                 Key: SPARK-37752
>                 URL: https://issues.apache.org/jira/browse/SPARK-37752
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.4
>            Reporter: Ohad Raviv
>            Priority: Minor
>
> Haven't checked on newer versions yet.
> If i define in Python:
> {code:java}
> def udf1(col1):
>     print(col1[2])
>     return "blah"
> spark.udf.register("udf1", udf1) {code}
> and then use it in SQL:
> {code:java}
> select case when length(c)>2 then udf1(c) end
> from (
>     select explode(array("123","234","12")) as c
> ) {code}
> it fails on:
> {noformat}
>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 253, in main
>     process()
>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 248, in process
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 155, in <lambda>
>     func = lambda _, it: map(mapper, it)
>   File "<string>", line 1, in <lambda>
>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 76, in <lambda>
>     return lambda *a: f(*a)
>   File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/util.py", line 55, in wrapper
>     return f(*args, **kwargs)
>   File "<stdin>", line 3, in udf1
> IndexError: string index out of range{noformat}
> Although in the out-of-range row it should not get evaluated at all as the case-when filters for lengths of more than 2 letters.
> the same scenario works great when we define instead a Scala UDF.
> will check now if it happens also for newer versions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org