You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "yjshen (via GitHub)" <gi...@apache.org> on 2023/03/14 23:17:12 UTC

[GitHub] [arrow-datafusion] yjshen commented on issue #5600: [DISCUSSION] Add separate crate to cover spark builtin functions

yjshen commented on issue #5600:
URL: https://github.com/apache/arrow-datafusion/issues/5600#issuecomment-1468994048

   Separating Spark functions into a special crate seems reasonable but supporting Spark UDFs requires significant effort. This is because many UDFs in Spark are designed to be compatible with Hive and handle corner cases differently than other databases like PG. These corner cases increase the workload of integrating Spark/Hive with DataFusion.
   
   When developing Blaze, we must compare the implementations of both engines or port tests first to ensure that they have identical semantics before passing a UDF for execution by DataFusion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org