You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/05/24 11:53:02 UTC

[GitHub] [spark] HyukjinKwon edited a comment on issue #24675: [SPARK-27803][SQL][PYTHON] Fix column pruning for Python UDF

HyukjinKwon edited a comment on issue #24675: [SPARK-27803][SQL][PYTHON] Fix column pruning for Python UDF
URL: https://github.com/apache/spark/pull/24675#issuecomment-495591310
 
 
   BTW, just to be sync'ed with you too @BryanCutler, @viirya and @icexelloss, I am planning to add a bunch of tests specific to regular Python UDF and Pandas Scalar UDF, which are possibly able to reused to Scala UDF too - I am trying to find a way to duplicate as much as possible. I hopefully it makes sense to you guys.
   
   This special rule `ExtractPythonUDF[s|FromAggregate]` has unevaluable expressions that always has to be wrapped with special plans. Seems like we remove some hacks now but I think we're not sure about the coverage.
   
   I think we started to observe those issues since we turn those Python ones from physical plans to logical plans, which was (I think) right fix but couldn't catch many cases like this. My idea is basically to share (or partially duplicate) *.sql files for Python / Pandas / Scala UDFs - hope this idea prevents such issues in the future.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org