You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Mustafa Iman (Jira)" <ji...@apache.org> on 2020/04/20 06:05:00 UTC

[jira] [Updated] (TEZ-4137) Input/Output/Processor should merge payload to local conf

     [ https://issues.apache.org/jira/browse/TEZ-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mustafa Iman updated TEZ-4137:
------------------------------
    Description: 
This patch introduces config merging to various Input and Output processors. As described in https://issues.apache.org/jira/browse/TEZ-4073 , we need to reduce the size of the configuration objects transferred over the wire. There are two improvements we are planning to do regarding to that:
 # Skip sending default configs and configuration coming from xml files in payload
 # Send dag, vertex and session configurations in layers instead of sending dag + vertex + session configs all together three times.

In order to achieve these,
 * We need to expose local config on Task side through TaskContext.
 * Input/Output/Processors must merge the config from user payload to local config in their TaskContext

Since runtime components did not have access to local config before, tez clients sent all config required at runtime in user payload. After this change, tez clients can reduce their payload size.

  was:
This patch introduces config merging to various Input and Output processors. As described in https://issues.apache.org/jira/browse/TEZ-4073 , we need to reduce the size of the configuration objects transferred over the wire. There are two improvements we are planning to do regarding to that:
 # Skip sending default configs and configuration coming from xml files in payload
 # Send dag, vertex and session configurations in layers instead of sending dag + vertex + session configs all together three times.

In order to achieve these,
 * We need to expose local config coming from configuration files to TaskContext.
 * Input/Output processors must merge the config from user payload to local config in their TaskContext

This is the configuration merging part. After this is merged, corresponding changes should be made on Hive side to prevent sending redundant configs. Until Hive side is updated, changes here are only overhead because all the config objects are the same and they have all the config options anyway.

        Summary: Input/Output/Processor should merge payload to local conf  (was: Input/Output processors should merge payload to local conf)

> Input/Output/Processor should merge payload to local conf
> ---------------------------------------------------------
>
>                 Key: TEZ-4137
>                 URL: https://issues.apache.org/jira/browse/TEZ-4137
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Mustafa Iman
>            Assignee: Mustafa Iman
>            Priority: Major
>         Attachments: TEZ-4137.1.patch, TEZ-4137.2.patch, TEZ-4137.3.patch, TEZ-4137.4.patch
>
>
> This patch introduces config merging to various Input and Output processors. As described in https://issues.apache.org/jira/browse/TEZ-4073 , we need to reduce the size of the configuration objects transferred over the wire. There are two improvements we are planning to do regarding to that:
>  # Skip sending default configs and configuration coming from xml files in payload
>  # Send dag, vertex and session configurations in layers instead of sending dag + vertex + session configs all together three times.
> In order to achieve these,
>  * We need to expose local config on Task side through TaskContext.
>  * Input/Output/Processors must merge the config from user payload to local config in their TaskContext
> Since runtime components did not have access to local config before, tez clients sent all config required at runtime in user payload. After this change, tez clients can reduce their payload size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)