You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/26 04:52:48 UTC

[GitHub] [spark] Tagar commented on pull request #26783: [SPARK-30153][PYTHON][WIP] Extend data exchange options for vectorized UDF functions with vanilla Arrow serialization

Tagar commented on pull request #26783:
URL: https://github.com/apache/spark/pull/26783#issuecomment-663935674


   Sorry for the uninitiated here.. 
   Just out of curiosity, that 3x performance improvement was for CPU execution?
   Reading a little bit on `awkward_array` - it can use cuda-kernels too 
   https://awkward-array.readthedocs.io/en/latest/index.html#more-documentation 
   Would be great to see what that improvement be on GPUs? 
   IMO this would be a great use case for PySpark UDF execution directly on GPUs,
   and deserves a separate `@numpy_udf` designation just like there is `@pandas_udf`. 
   Piggy backing on PandasUDF interface is confusing as this PR actually .. tries to avoid using Pandas. 
   Numba is another example that supports just-in-time compiling of Numpy logic to be 
   executed on GPUs 
   https://numba.pydata.org/numba-doc/latest/cuda/index.html
   My 2 cents.. I think it would be a great improvement! 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org