You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Vibhatha Lakmal Abeykoon (Jira)" <ji...@apache.org> on 2022/10/25 07:09:00 UTC
[jira] [Assigned] (ARROW-15635) [C++][Python] UDF Integration
[ https://issues.apache.org/jira/browse/ARROW-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vibhatha Lakmal Abeykoon reassigned ARROW-15635:
------------------------------------------------
Assignee: Vibhatha Lakmal Abeykoon
> [C++][Python] UDF Integration
> ------------------------------
>
> Key: ARROW-15635
> URL: https://issues.apache.org/jira/browse/ARROW-15635
> Project: Apache Arrow
> Issue Type: Task
> Components: C++, Python
> Reporter: Vibhatha Lakmal Abeykoon
> Assignee: Vibhatha Lakmal Abeykoon
> Priority: Major
> Labels: pull-request-available
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> The objective is to list down a set of tasks required to provide UDF support for Apache Arrow streaming execution engine. In the first iteration we will be focusing on providing support for Python-based UDFs which can support Python functions.
> The UDF Integration is going to pan out with a series of sub-tasks associated with the development and PoCs. Note that this is going to be the first iteration of UDF integrations with a limited scope. This ticket will cover the following topics;
> # POC for UDF integration: The objective is to evaluate the existing components in the source and evaluate the required modifications and new building blocks required to integrate UDFs.
> # The language will be limited to C+{+}/{+}Python users can register Python function as a UDF and use it with an `apply` method on Arrow Tables or provide a computation API endpoint via arrow::compute API. Note that the C+ API already provides a way to register custom functions via the function registry API. At the moment this is not exposed to Python.
> # Planned features for this ticket are;
> ## Scalar UDFs : UDFs executed per value (per row)
> ## Vector UDFs : UDFs executed per batch (a full array or partial array)
> ## Aggregate UDFs : UDFs associated with an aggregation operation
> # Integration limitations
> ## Doesn't support custom data types which doesn't support Numpy or Pandas
> ## Complex processing with parallelism within UDFs are not supported
> ## Parallel UDFs are not supported in the initial version of UDFs. Allthough we are documenting what is required and a rough sketch for the next phase.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)