You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2016/07/19 21:12:20 UTC

[jira] [Comment Edited] (TEZ-3357) Change TimecachePlugin to return grouped entity ids.

    [ https://issues.apache.org/jira/browse/TEZ-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384796#comment-15384796 ] 

Hitesh Shah edited comment on TEZ-3357 at 7/19/16 9:11 PM:
-----------------------------------------------------------

Comments: 
  - TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP - usedNumDagsPerGroup - configs do not use camel case. Either "." or "-". 
  -  ConfigurationScope(Scope.AM) - does not make sense here at all. 
  - Why is TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT needed? Can't it fall back to the configured value used when writing/creating these groups?
  - Changes in TezUtilsInternal: In retrospect, the grouping logic can belong in TezDAGId.java. A couple of points though:
    - use just daggroup or groupId and do not reference timeline in docs or var names
    - s/DAGGROUP/daggroup/
    - s/AtsGroup/DagGroup/
 
  - For "TimelineEntityGroupPlugin implements Configurable" - how are we certain that YARN will invoke setConf on the plugin? The base class TimelineEntityGroupPlugin does not require a plugin to implement methods defined as part of the Configurable interface. Additionally, the config would be set in tez-site.xml - who would load this config file? 
     - What if setConf is never called? 

{code}
conf.get(TezConfiguration.TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP,
184	        TezConfiguration.TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT).split(",");
{code}
   - maybe use conf.getStrings() ? 

{code}
   * Comma separated list of numDagsPerGroup used until now. This is used by the TimelineCacheService to generate
1233	   * TimelineEntityGroupIds. Do not add too many here, it will affect the performance of ATS during query time.
{code}
   - Needs a bit more clarity on what this config implies or maybe a reference to the write path where this will be used? 

TestTimelineCachePluginImpl needs tests for where TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT is configured to empty i.e. <value></value> or more than one value i.e. "50,100".

{code}
    for (int i = 0; i < usedNumGroups.length; ++i) {
187	      allNumGroupsPerDag[i] = Integer.parseInt(usedNumGroups[i]);
188	    }
{code}
   - what if someone sets an invalid value? The stack trace without an accompaning message of which config property would make it tough for a user to figure out what is wrong. 






was (Author: hitesh):
Comments: 
  - TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP - usedNumDagsPerGroup - configs do not use camel case. Either "." or "-". 
  -  ConfigurationScope(Scope.AM) - does not make sense here at all. 
  - Why is TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT needed? Can't it fall back to the configured value used when writing/creating these groups?
  - Changes in TezUtilsInternal: In retrospect, the grouping logic can belong in TezDAGId.java. A couple of points though:
    - use just daggroup or groupId and do not reference timeline in docs or var names
    - s/DAGGROUP/daggroup/
    - s/AtsGroup/DagGroup/
 
  - For "TimelineEntityGroupPlugin implements Configurable" - how are we certain that YARN will invoke setConf on the plugin? The base class TimelineEntityGroupPlugin does not require a plugin to implement methods defined as part of the Configurable interface. 
     - What if setConf is never called? 

{code}
conf.get(TezConfiguration.TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP,
184	        TezConfiguration.TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT).split(",");
{code}
   - maybe use conf.getStrings() ? 

{code}
   * Comma separated list of numDagsPerGroup used until now. This is used by the TimelineCacheService to generate
1233	   * TimelineEntityGroupIds. Do not add too many here, it will affect the performance of ATS during query time.
{code}
   - Needs a bit more clarity on what this config implies or maybe a reference to the write path where this will be used? 

TestTimelineCachePluginImpl needs tests for where TEZ_HISTORY_LOGGING_USED_NUM_DAGS_PER_GROUP_DEFAULT is configured to empty i.e. <value></value> or more than one value i.e. "50,100".

{code}
    for (int i = 0; i < usedNumGroups.length; ++i) {
187	      allNumGroupsPerDag[i] = Integer.parseInt(usedNumGroups[i]);
188	    }
{code}
   - what if someone sets an invalid value? The stack trace without an accompaning message of which config property would make it tough for a user to figure out what is wrong. 





> Change TimecachePlugin to return grouped entity ids.
> ----------------------------------------------------
>
>                 Key: TEZ-3357
>                 URL: https://issues.apache.org/jira/browse/TEZ-3357
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Harish Jaiprakash
>            Assignee: Harish Jaiprakash
>         Attachments: TEZ-3357.01.patch
>
>
> TimelineCachePlugin has to return TimelineEntityGroupId grouped based on dag id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)