You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/28 16:46:12 UTC

[GitHub] [arrow] andygrove edited a comment on pull request #7967: ARROW-9751: [Rust] [DataFusion] Allow UDFs to accept multiple data types per argument

andygrove edited a comment on pull request #7967:
URL: https://github.com/apache/arrow/pull/7967#issuecomment-682868097


   Thanks @Jorge it seems like I
   misunderstood the scope of the change. I will review your response this
   weekend to make sure I understand.
   
   > What I struggle a bit to understand is that I have raised this concern
   before and no-one has offered a solution to it.
   
   It is likely that others just haven't had the time to look yet. I know I've
   had little time this week due to work commitments.
   
   
   On Fri, Aug 28, 2020 at 10:24 AM Jorge Leitao <no...@github.com>
   wrote:
   
   > The discussion for me is not so much about the return type of the function
   > sqrt. I am fine with sqrt_f32. My issue is with the output of the
   > function array, collect_list, concat, and a significant number of other
   > functions whose return type is variable.
   >
   > I do not think that other query engines are designed to have a fixed
   > return type: collect_list has a variable return type, getitem has a
   > variable return type; array has a variable return type. Essentially,
   > every function that works on non-primitive types have a variable return
   > type, because there are common operations on structs that are irrespective
   > of the specific types that they hold.
   >
   > The functions that I see we will hit soon are functions that operate on
   > strings: arrow supports utf8 and Largeutf8. Are we also making all
   > functions that operate on strings (e.g. concat) always return Largeutf8,
   > or utf8? What about binary and LargeBinary? I feel that by forcing a
   > single return type, DataFusion drifts farther away from Arrow as we end up
   > supporting a smaller and smaller subset of the types that arrow supports on
   > its execution plans.
   >
   > For me, we should embrace the fact that most engines migrated to variable
   > return types at some point in time during their life-time and design a
   > typing/function system that caters for that from the get go, so that we
   > have all options available to us when we want to expand our function set.
   >
   > At the moment, I am confident that the approach in #8032
   > <https://github.com/apache/arrow/pull/8032> covers all relevant
   > use-cases, and is also fully aligned with how we do it for binary math and
   > logical operators.
   >
   > Of course I would respect the decision to make all built-in functions/UDF
   > of a fixed return type, and design the engine assuming that invariant, but
   > it would be difficult for me to continue to contribute to that part of the
   > code base when I am 90% certain that we will hit another design blocker
   > when we want to implement array([c1, c2]) or something. The only viable
   > option I see to add a new function such as array, without an item in Expr
   > that generically supports them, is to introduce a new entry to enum Expr.
   > Doing so is backward incompatible, which means that we will introducing a
   > backward incompatible change every time we need to add a new function of
   > variable return type.
   >
   > What I struggle a bit to understand is that I have raised this concern
   > before and no-one has offered a solution to it.
   > Specifically, if we design our UDF's API (which currently supports all our
   > built-in functions) as fixed return type, how will we support array and
   > other scalar functions of variable return type without introducing a
   > backward incompatible change on every new function?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/arrow/pull/7967#issuecomment-682832105>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAHEBRBKXIXSQ6XBF4QZ2YLSC7K2FANCNFSM4QAG5BEA>
   > .
   >
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org