You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Siddharth Seth (JIRA)" <ji...@apache.org> on 2015/04/25 01:34:38 UTC

[jira] [Updated] (TEZ-2368) Make the dag number available in Context classes

     [ https://issues.apache.org/jira/browse/TEZ-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siddharth Seth updated TEZ-2368:
--------------------------------
    Attachment: TEZ-2368.1.txt

Straightforward patch, with a not so useful unit test. [~hitesh], [~rajesh.balamohan], [~bikassaha] - please review.

> Make the dag number available in Context classes
> ------------------------------------------------
>
>                 Key: TEZ-2368
>                 URL: https://issues.apache.org/jira/browse/TEZ-2368
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: TEZ-2368.1.txt
>
>
> Provide the dag number, which is a unique number, for each dag running within an application in the TezInputContext, TezOutputContext, TezProcessorContext.
> When containers are re-used, or for external services, this can be used to generate intermediate data to a dag specific directory instead of an application specific directory, where it becomes difficult to differentiate between different dags.
> The DAG name does provide this - but is not suitable for use in a directory name. Hashing the name is an option, but can lead to collisions.
> Generating data into a dag specific directory will eventually only be usable when we move away from the default MR handler, or enhance it to support an additional parameter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)