You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "xinrong-meng (via GitHub)" <gi...@apache.org> on 2024/01/16 19:35:20 UTC

[PR] [SPARK-46663][PYTHON][3.5] Disable memory profiler for pandas UDFs with iterators [spark]

xinrong-meng opened a new pull request, #44760:
URL: https://github.com/apache/spark/pull/44760

   ### What changes were proposed in this pull request?
   When using pandas UDFs with iterators, if users enable the profiling spark conf, a warning indicating non-support should be raised, and profiling should be disabled.
   
   However, currently, after raising the not-supported warning, the memory profiler is still being enabled.
   
   The PR proposed to fix that.
   
   ### Why are the changes needed?
   A bug fix to eliminate misleading behavior. 
   
   ### Does this PR introduce _any_ user-facing change?
   The noticeable changes will affect only those using the PySpark shell. This is because, in the PySpark shell, the memory profiler will raise an error, which in turn blocks the execution of the UDF.
   
   ### How was this patch tested?
   Manual test.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   Setup:
   ```py
   $ ./bin/pyspark --conf spark.python.profile=true
   
   >>> from typing import Iterator
   >>> from pyspark.sql.functions import *
   >>> import pandas as pd
   >>> @pandas_udf("long")
   ... def plus_one(iterator: Iterator[pd.Series]) -> Iterator[pd.Series]:
   ...     for s in iterator:
   ...         yield s + 1
   ... 
   >>> df = spark.createDataFrame(pd.DataFrame([1, 2, 3], columns=["v"]))
   ```
   
   Before:
   ```
   >>> df.select(plus_one(df.v)).show()
   UserWarning: Profiling UDFs with iterators input/output is not supported.
   Traceback (most recent call last):
   ...
   OSError: could not get source code
   ```
   
   After:
   ```
   >>> df.select(plus_one(df.v)).show()
   /Users/xinrong.meng/spark/python/pyspark/sql/udf.py:417: UserWarning: Profiling UDFs with iterators input/output is not supported.
   +-----------+                                                                   
   |plus_one(v)|
   +-----------+
   |          2|
   |          3|
   |          4|
   +-----------+
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46663][PYTHON][3.5] Disable memory profiler for pandas UDFs with iterators [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44760:
URL: https://github.com/apache/spark/pull/44760#issuecomment-1897515277

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46663][PYTHON][3.5] Disable memory profiler for pandas UDFs with iterators [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #44760: [SPARK-46663][PYTHON][3.5] Disable memory profiler for pandas UDFs with iterators
URL: https://github.com/apache/spark/pull/44760


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org