You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/08/15 00:52:22 UTC

[GitHub] [spark] HyukjinKwon opened a new pull request #25457: [SPARK-27234][SS][PYTHON] Use InheritableThreadLocal for current epoch in EpochTracker (to support Python UDFs)

HyukjinKwon opened a new pull request #25457: [SPARK-27234][SS][PYTHON] Use InheritableThreadLocal for current epoch in EpochTracker (to support Python UDFs)
URL: https://github.com/apache/spark/pull/25457
 
 
   ## What changes were proposed in this pull request?
   
   This PR proposes to use `InheritableThreadLocal` instead of `ThreadLocal` for current epoch in `EpochTracker`. Python UDF needs threads to write out to and read it from Python processes and when there are new threads, previously set epoch is lost.
   
   After this PR, Python UDFs can be used at Structured Streaming with the continuous mode.
   
   ## How was this patch tested?
   
   Manually tested:
   
   ```python
   from pyspark.sql.functions import col, udf
   fooUDF = udf(lambda p: "foo")
   
   spark \
   .readStream \
   .format("rate") \
   .load()\
   .withColumn("foo", fooUDF(col("value")))\
   .writeStream\
   .format("console")\
   .trigger(continuous="1 second").start() 
   ```
   
   Note that test was not ported because:
   
   1. `IntegratedUDFTestUtils` only exists in master.
   2. Missing SS testing utils in PySpark code base.
   3. Writing new test for branch-2.4 specifically might bring even more overhead due to mismatch against master.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org