You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Li Lu (JIRA)" <ji...@apache.org> on 2015/09/01 01:52:46 UTC

[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run table

    [ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724416#comment-14724416 ] 

Li Lu commented on YARN-3901:
-----------------------------

Hi [~vrushalic], I thought you might be working on a new version of patch, so perhaps I should wait for another version and then post my detailed commnets? Back to the previous discussions:

bq. During table creation time, we specify the coprocessor class. This can also be done later by alter table command as desired.
This is totally fine for our POC. However, we do need to think about deployment in the future, together with many other challenges like Phoenix and/or offline aggregation. 

bq. There are some differences between the two aggregations, I think. Not sure if the classes can be reused without complicating development efforts. For the PoC I would like to focus on these tables independently. We could file follow up jiras to refactor the code as we see fit when the whole picture emerges, does that sound good?
Sure. Actually I was wondering if the strategy we're using here is also applicable to app level aggregations, since both of them receives online data and store them in HBase (our online storage, compare to Phoenix). Our time-based aggregator works in a quite different way, where it reads data from online storage and aggregate data in a batched fashion. This said, maybe we should use the approach in this patch as a general "online aggregation" approach, and provide aggregate APIs in timeline metric class for offline aggregators? 

> Populate flow run data in the flow_run table
> --------------------------------------------
>
>                 Key: YARN-3901
>                 URL: https://issues.apache.org/jira/browse/YARN-3901
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Vrushali C
>            Assignee: Vrushali C
>         Attachments: YARN-3901-YARN-2928.1.patch, YARN-3901-YARN-2928.WIP.2.patch, YARN-3901-YARN-2928.WIP.patch
>
>
> As per the schema proposed in YARN-3815 in https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf
> filing jira to track creation and population of data in the flow run table. 
> Some points that are being  considered:
> - Stores per flow run information aggregated across applications, flow version
> RM’s collector writes to on app creation and app completion
> - Per App collector writes to it for metric updates at a slower frequency than the metric updates to application table
> primary key: cluster ! user ! flow ! flow run id
> - Only the latest version of flow-level aggregated metrics will be kept, even if the entity and application level keep a timeseries.
> - The running_apps column will be incremented on app creation, and decremented on app completion.
> - For min_start_time the RM writer will simply write a value with the tag for the applicationId. A coprocessor will return the min value of all written values. - 
> - Upon flush and compactions, the min value between all the cells of this column will be written to the cell without any tag (empty tag) and all the other cells will be discarded.
> - Ditto for the max_end_time, but then the max will be kept.
> - Tags are represented as #type:value. The type can be not set (0), or can indicate running (1) or complete (2). In those cases (for metrics) only complete app metrics are collapsed on compaction.
> - The m! values are aggregated (summed) upon read. Only when applications are completed (indicated by tag type 2) can the values be collapsed.
> - The application ids that have completed and been aggregated into the flow numbers are retained in a separate column for historical tracking: we don’t want to re-aggregate for those upon replay
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)