You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/05 21:45:17 UTC

[GitHub] [arrow] jorisvandenbossche commented on pull request #12590: ARROW-15639 [C++][Python] UDF Scalar Function Implementation

jorisvandenbossche commented on PR #12590:
URL: https://github.com/apache/arrow/pull/12590#issuecomment-1089397801

   > I agree that the notion of "scalar function" is likely to be foreign to our users and we should make sure to define it very clearly in our documentation. 
   > A scalar function is a function that generates one output value for every input row. 
   
   I _think_ that I am familiar with out usage of the term "scalar function" in our compute kernels, but AFAIK that's not really how it translates here. 
   I expect that a scalar kernel is one that indeed is performed independently element-wise on the values (and thus has the characteristics of parallelization etc that you describe), but it's still a function you can call on a full (chunked) array, creating a new (chunked) array of the same size. But with the current `InputType.scalar`, you can only call the registered UDF on a scalar, not on a (chunked) array. So that's where the current usage of this term in the new API seems to conflict with the usage of this term in general in the compute kernels. Because if I want to actually register a "scalar kernel" UDF, I need to use 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org