You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Guowei Ma (Jira)" <ji...@apache.org> on 2020/01/06 08:21:00 UTC

[jira] [Commented] (FLINK-15032) Remove the eager serialization from `RemoteRpcInvocation`

    [ https://issues.apache.org/jira/browse/FLINK-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008617#comment-17008617 ] 

Guowei Ma commented on FLINK-15032:
-----------------------------------

Thanks [~trohrmann] and [~SleePy] for your concerns. As [~SleePy] said there is no good way to check the size of load without serialization.  For reducing the memory pressure we could only compute size of load but not save the serialization result. But it is only useful for the specific scenarios for example (large parallelism and network is limited).   So I would close this issue now.

> Remove the eager serialization from `RemoteRpcInvocation` 
> ----------------------------------------------------------
>
>                 Key: FLINK-15032
>                 URL: https://issues.apache.org/jira/browse/FLINK-15032
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>            Reporter: Guowei Ma
>            Priority: Major
>
>     Currently, the constructor of `RemoteRpcInvocation` serializes the `parameterTypes` and `arg` of an RPC call. This could lead to a problem:
> Consider a job that has 1k parallelism and has a 1m union list state. When deploying the 1k tasks, the eager serialization would use 1G memory instantly(Some time the serialization amplifies the memory usage). However, the serialized object is only used when the Akka sends the message.  So we could reduce the memory pressure if we only serialize the object when the message would be sent by the Akka.
> Akka would serialize the message at last and all the XXXGateway related class could be visible by the RPC level. Because of that, I think the eager serialization in the constructor of `RemoteRpcInvocation` could be avoided. I also do a simple test and find this could reduce the time cost of the RPC call. The 1k number of  RPC calls with 1m `String` message:  The current version costs around 2700ms; the Nonserialization version cost about 37ms.
>  
> In summary, this Jira proposes to remove the eager serialization at the constructor of `RemoteRpcInvocation`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)