You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Brian Hulette (Jira)" <ji...@apache.org> on 2020/04/29 20:26:00 UTC

[jira] [Commented] (BEAM-9784) Add an Arrow Flight based Python IO connector

    [ https://issues.apache.org/jira/browse/BEAM-9784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095871#comment-17095871 ] 

Brian Hulette commented on BEAM-9784:
-------------------------------------

In theory if BEAM-9783 is finished first this could be done as a cross-language transform. But doing it that way would require converting batches to rows for the transfer (since Beam coders can't encode batches of elements for now), and would prevent python from accessing the raw arrow record batches. Since python users could process those record batches efficiently with pyarrow/pandas that would be a major downside.

So I think this should be done natively in python

> Add an Arrow Flight based Python IO connector
> ---------------------------------------------
>
>                 Key: BEAM-9784
>                 URL: https://issues.apache.org/jira/browse/BEAM-9784
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-ideas
>            Reporter: Ismaël Mejía
>            Priority: Minor
>
> Arrow Flight is a new general-purpose client-server framework to simplify high performance transport of large datasets over network interfaces. It defines a unifed API to different data systems with data serialized in the middle in Arrow format.
> Having a connector for this will enable Beam users to connect to Flight compatible systems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)