You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (Jira)" <ji...@apache.org> on 2019/09/02 08:55:00 UTC

[jira] [Commented] (FLINK-4399) Add support for oversized messages

    [ https://issues.apache.org/jira/browse/FLINK-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920707#comment-16920707 ] 

Till Rohrmann commented on FLINK-4399:
--------------------------------------

Hi [~SleePy], thanks for creating the design document. I really like your proposal with splitting large messages up into smaller chunks. The main question I have is whether we really need this feature. As I've written in the Google doc, I assume that we would also define an upper bound for the message size because otherwise it can easily cause OOM errors. Instead, it could be enough to add some utilities to offload large payloads to the {{BlobServer}} and then handle all message which can carry a large user code payload on a higher level as we do with the {{TaskDeploymentDescriptor}}.

> Add support for oversized messages
> ----------------------------------
>
>                 Key: FLINK-4399
>                 URL: https://issues.apache.org/jira/browse/FLINK-4399
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>         Environment: FLIP-6 feature branch
>            Reporter: Stephan Ewen
>            Assignee: Biao Liu
>            Priority: Major
>              Labels: flip-6
>
> Currently, messages larger than the maximum Akka Framesize cause an error when being transported. We should add a way to pass messages that are larger than the Framesize, as may happen for:
>   - {{collect()}} calls that collect large data sets (via accumulators)
>   - Job submissions and operator deployments where the functions closures are large (for example because it contains large pre-loaded data)
>   - Function restore in cases where restored state is larger than checkpointed state (union state)
> I suggest to use the {{BlobManager}} to transfer large payload.
>   - On the sender side, oversized messages are stored under a transient blob (which is deleted after first retrieval, or after a certain number of minutes)
>   - The sender sends a "pointer to blob message" instead.
>   - The receiver grabs the message from the blob upon receiving the pointer message
> The RPC Service should be optionally initializable with a "large message handler" which is internally the {{BlobManager}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)