You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Fabian Hueske (JIRA)" <ji...@apache.org> on 2019/04/10 08:15:00 UTC
[jira] [Comment Edited] (FLINK-10929) Add support for Apache Arrow

    [ https://issues.apache.org/jira/browse/FLINK-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16699093#comment-16699093 ] 

Fabian Hueske edited comment on FLINK-10929 at 4/10/19 8:14 AM:
----------------------------------------------------------------

Arrow is a columnar in-memory data storage / exchange format. This means it was not designed with point updates / queries in mind which is the access pattern for a state backend in Flink. I'm not saying that there is no use case for Arrow in streaming, but IMO there is no obvious one. 

For batch SQL, yes, it might make sense but it would be a huge change (basically completely rewriting the data processing engine). 
One could also think of adding interfaces to consume or export Arrow data. There are probably more scenarios in which Arrow could be used.

I think we need more detail to decide whether this proposal is something that is useful (and achievable) for Flink or not.
[~pcless], what was the use case you had in mind when opening this Jira?





was (Author: fhueske):
Arrow is a columnar in-memory data storage / exchange format. This means it was not designed with point updates / queries in mind which is the access pattern for a state backend in Flink. I'm not saying that there is no use case for Arrow in streaming, but IMO there is obvious one. 

For batch SQL, yes, it might make sense but it would be a huge change (basically completely rewriting the data processing engine). 
One could also think of adding interfaces to consume or export Arrow data. There are probably more scenarios in which Arrow could be used.

I think we need more detail to decide whether this proposal is something that is useful (and achievable) for Flink or not.
[~pcless], what was the use case you had in mind when opening this Jira?




> Add support for Apache Arrow
> ----------------------------
>
>                 Key: FLINK-10929
>                 URL: https://issues.apache.org/jira/browse/FLINK-10929
>             Project: Flink
>          Issue Type: Wish
>          Components: Runtime / State Backends
>            Reporter: Pedro Cardoso Silva
>            Priority: Minor
>         Attachments: image-2019-04-10-13-43-08-107.png
>
>
> Investigate the possibility of adding support for Apache Arrow as a standardized columnar, memory format for data.
> Given the activity that [https://github.com/apache/arrow] is currently getting and its claims objective of providing a zero-copy, standardized data format across platforms, I think it makes sense for Flink to look into supporting it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)