You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jonathan Turner Eagles (Jira)" <ji...@apache.org> on 2020/04/17 16:39:00 UTC

[jira] [Comment Edited] (TEZ-4141) Let Input/Output Processors load local xml configs

    [ https://issues.apache.org/jira/browse/TEZ-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085915#comment-17085915 ] 

Jonathan Turner Eagles edited comment on TEZ-4141 at 4/17/20, 4:38 PM:
-----------------------------------------------------------------------

I don't think this will be a generally good way to save bytes going over the wire. Reading local config files like this weaken the idempotency of jobs, as the configuration can differ across attempts. As new software rolls out across the cluster, the guarantee of producing the correct result is lessened. In addition, there is no historical record of the configuration that was used, making debugging difficult or impossible. Lastly, reading configuration off of the disk is strictly slower than the process we have now.

Better to reduce the configuration through a filter, or hierarchically structure the configuration so that Dag + delta for vertex + delta for task.

Another technique may be to sort the configuration keys to increase compression locality for the compression codec.


was (Author: jeagles):
I don't think this will be a generally good way to save bytes going over the wire. Reading local config files like this weaken the idempotency of jobs, as the configuration can differ across attempts. As new software rolls out across the cluster, the guarantee of producing the correct result is lessened. In addition, there is no historical record of the configuration that was used, making debugging difficult or impossible. Lastly, reading configuration off of the disk is strictly slower than the process we have now. Better to reduce the configuration through a filter, or hierarchically structure the configuration so that Dag + delta for vertex + delta for task.

One technique may be to sort the configuration keys to increase compression locality for the compression codec.

> Let Input/Output Processors load local xml configs
> --------------------------------------------------
>
>                 Key: TEZ-4141
>                 URL: https://issues.apache.org/jira/browse/TEZ-4141
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Mustafa Iman
>            Assignee: Mustafa Iman
>            Priority: Major
>         Attachments: TEZ-4141.1.patch
>
>
> We would like to reduce the amount of configuration going over the wire from a client to application master. If Input/Output processors load local config files, we can reduce the configuration overhead when client and the processors have the exact same config on both sides. It is on user of client to keep the configs same on both sides. Currently, clients have to send all config in payload. Even if we preload config with local xml files, these should be overridden by the full config object coming in payload. Therefore, old clients that send all the config anyway would not be affected in terms of correctness from this change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)