You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "László Bodor (Jira)" <ji...@apache.org> on 2020/05/07 17:15:00 UTC

[jira] [Commented] (TEZ-4137) Input/Output/Processor should merge payload to local conf

    [ https://issues.apache.org/jira/browse/TEZ-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101893#comment-17101893 ] 

László Bodor commented on TEZ-4137:
-----------------------------------

at the moment I'm considering container mode, as the patch should work for that obviously in the first round
so yes, tez-conf.pb is propagated to both the AM and the containers and is read by TezUtilsInternal.readUserSpecifiedTezConfiguration, so the parts of using getContainerConfiguration do make sense to me

also, unit tests seem to cover config merge at the end of the chain, however, I'm personally more interested in an end-to-end kind of unit test, which would reflect the changes on higher level if possible...I mean, with this patch, upstream components could assume that they don't have to readd some configuration into vertex payload again, which were already covered in tez-conf.pb, so in tez codebase we need to make sure that if you put a config at the beginning of the chain (submitting dag), it's indeed present at the end (with user payload not containing that config property)

but before proceeding, [~jeagles], could you please share your thoughts about this patch? it makes sense to me, but I'm not 100% confident of the overall picture, and you told you've already thought about some config/payload optimizations

> Input/Output/Processor should merge payload to local conf
> ---------------------------------------------------------
>
>                 Key: TEZ-4137
>                 URL: https://issues.apache.org/jira/browse/TEZ-4137
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Mustafa Iman
>            Assignee: Mustafa Iman
>            Priority: Major
>         Attachments: TEZ-4137.1.patch, TEZ-4137.2.patch, TEZ-4137.3.patch, TEZ-4137.4.patch, TEZ-4137.4.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> This patch introduces config merging to various Input and Output processors. As described in https://issues.apache.org/jira/browse/TEZ-4073 , we need to reduce the size of the configuration objects transferred over the wire. There are two improvements we are planning to do regarding to that:
>  # Skip sending default configs and configuration coming from xml files in payload
>  # Send dag, vertex and session configurations in layers instead of sending dag + vertex + session configs all together three times.
> In order to achieve these,
>  * We need to expose local config on Task side through TaskContext.
>  * Input/Output/Processors must merge the config from user payload to local config in their TaskContext
> Since runtime components did not have access to local config before, tez clients sent all config required at runtime in user payload. After this change, tez clients can reduce their payload size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)