You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "zhijiang (JIRA)" <ji...@apache.org> on 2018/09/04 09:17:00 UTC

[jira] [Commented] (FLINK-4175) Broadcast data sent increases with # slots per TM

    [ https://issues.apache.org/jira/browse/FLINK-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602806#comment-16602806 ] 

zhijiang commented on FLINK-4175:
---------------------------------

I am interested in this improvement and want to know what is the current status of it?

> Broadcast data sent increases with # slots per TM
> -------------------------------------------------
>
>                 Key: FLINK-4175
>                 URL: https://issues.apache.org/jira/browse/FLINK-4175
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core, Distributed Coordination
>    Affects Versions: 1.0.3
>            Reporter: Felix Neutatz
>            Assignee: Felix Neutatz
>            Priority: Major
>              Labels: performance
>
> Problem:
> we experience some unexpected increase of data sent over the network for broadcasts with increasing number of slots per Taskmanager.
> We provided a benchmark [1]. It not only increases the size of data sent over the network but also hurts performance as seen in the preliminary results below. In this results cloud-11 has 25 nodes and ibm-power has 8 nodes with scaling the number of slots per node from 1 - 16.
> +-----------------------+--------------+-------------+
> | suite                 | name         | median_time |
> +=======================+==============+=============+
> | broadcast.cloud-11    | broadcast.01 |        8796 |
> | broadcast.cloud-11    | broadcast.02 |       14802 |
> | broadcast.cloud-11    | broadcast.04 |       30173 |
> | broadcast.cloud-11    | broadcast.08 |       56936 |
> | broadcast.cloud-11    | broadcast.16 |      117507 |
> | broadcast.ibm-power-1 | broadcast.01 |        6807 |
> | broadcast.ibm-power-1 | broadcast.02 |        8443 |
> | broadcast.ibm-power-1 | broadcast.04 |       11823 |
> | broadcast.ibm-power-1 | broadcast.08 |       21655 |
> | broadcast.ibm-power-1 | broadcast.16 |       37426 |
> +-----------------------+--------------+-------------+
> After looking into the code base it, it seems that the data is de-serialized only once per TM, but the actual data is sent for all slots running the operator with broadcast vars and just gets discarded in case its already de-serialized.
> We do not see a reason the data can't be shared among the slots of a TM and therefore just sent once.
> [1] https://github.com/TU-Berlin-DIMA/flink-broadcast
> This Jira will continue the discussion started here: https://mail-archives.apache.org/mod_mbox/flink-dev/201606.mbox/%3C1465386300767.94345@tu-berlin.de%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)