You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Siddharth Seth (JIRA)" <ji...@apache.org> on 2015/06/18 20:56:00 UTC

[jira] [Commented] (TEZ-2565) Consider scanning unfinished tasks in VertexImpl::constructStatistics to reduce merge overhead

    [ https://issues.apache.org/jira/browse/TEZ-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592315#comment-14592315 ] 

Siddharth Seth commented on TEZ-2565:
-------------------------------------

Looking at this as part of TEZ-2496. Comments.
- Nit: "final Set<TezTaskID> taskSet;" - this could be a bitset
- "ioStats.putIfAbsent(name, new IOStatisticsImpl());" - Not a concurrentMap. A put should be sufficient ? There's other issues if we expect concurrent invocation of this method.
- {code}if (currentStatistics.containsTask(t.getTaskId())) { break;}{code} Should be a continue so that it moves forward and aggregates statistics for newly completed tasks.
- If we're only going to aggregate statistics when a task completes, this should be called out on the API. In theory, it's possible to report stats as a task makes progress. It would make sense to do this if progress was also available. Since we're not doing that at the moment - calling it out on the API javadocs will be useful.

- The change isn't thread safe if getStatistics can be called concurrently - since it's modifying stats under a read lock. I don't think that's possible right now since it's invoked from the VertexManager or when the Vertex completes under a write lock. This would need it's own lock while constructing stats if concurrent calls are possible - for example in the future if the Edge is allowed to access stats, or stats are pushed to consumers. This should ideally be documented.

> Consider scanning unfinished tasks in VertexImpl::constructStatistics to reduce merge overhead
> ----------------------------------------------------------------------------------------------
>
>                 Key: TEZ-2565
>                 URL: https://issues.apache.org/jira/browse/TEZ-2565
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2565.1.patch
>
>
> constructStatistics() can be an overhead (scanning all tasks and merging stats) depending on the number of invocations to Vertex::getStatistics().  Consider scanning only unfinished tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)