You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Zhijie Shen (JIRA)" <ji...@apache.org> on 2015/04/02 02:39:54 UTC
[jira] [Comment Edited] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

    [ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391864#comment-14391864 ] 

Zhijie Shen edited comment on YARN-3391 at 4/2/15 12:39 AM:
------------------------------------------------------------

Sangjin, thanks for your comments, too. According to your and Joep's comments, I can see the benefit to show application aggregation information by application (type). However, IMHO, it's orthogonal to flow definition. Isn't the straightforward approach to provide this feature via aggregating on application name/type dimension instead of let flow name = application name.

On the other side, flow should semantically stand for *workflow* (correct me if I'm wrong about flow concept), which contains a group of applications that work together to resolve a problem. Making flow name == application name changes the semantics That said, a flow of applications means the applications of the same type.

{quote}
 If a user is running TestDFSIO over and over, they should be recognized as different instances of the same thing.
{quote}

I guess the "same thing" you had in mind is not the same workflow, but the same application type, right? And back to Joep's web UI example, it's better to be described as "getting sum(cost) from apps where app_name(type) = sleep". Therefore, how about we decoupling the two concepts? One step back, when users set the flow explicitly, are they going to tell the application that it belongs to workflow ABC, or that it belongs to job type XYZ? I think it will be the former.


was (Author: zjshen):
Sangjin, thanks for your comments, too. According to your and Joep's comments, I can see the benefit to show application aggregation information by application (type). However, IMHO, it's orthogonal to flow definition. Isn't the straightforward approach to provide this feature via aggregating on application name/type dimension instead of let flow name = application name.

On the other side, flow should semantically stand for *workflow* (correct me if I'm wrong about flow concept), which contains a group of applications that work together to resolve a problem. Making flow name == application name changes the semantics That said, a flow of applications means the applications of the same type.

{quote}
 If a user is running TestDFSIO over and over, they should be recognized as different instances of the same thing.
{quote}

I guess the "same thing" you had in mind is not the same workflow, but the same application type, right? How about we decoupling the two concepts? One step back, when users set the flow explicitly, are they going to tell the application that you belong to workflow abc, or that you belong to job type xyz? I think it will be the former.

> Clearly define flow ID/ flow run / flow version in API and storage
> ------------------------------------------------------------------
>
>                 Key: YARN-3391
>                 URL: https://issues.apache.org/jira/browse/YARN-3391
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Zhijie Shen
>         Attachments: YARN-3391.1.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not set it)
> - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)