You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tez.apache.org by "Peter Slawski (JIRA)" <ji...@apache.org> on 2016/07/18 23:59:20 UTC

[jira] [Created] (TEZ-3356) Fix initializing of stats when custom ShuffleVertexManager is used

Peter Slawski created TEZ-3356:
----------------------------------

             Summary: Fix initializing of stats when custom ShuffleVertexManager is used
                 Key: TEZ-3356
                 URL: https://issues.apache.org/jira/browse/TEZ-3356
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.8.4
            Reporter: Peter Slawski


When using a custom ShuffleVertexManager to set a vertex’s parallelism, the partition stats field will be left uninitialized even after the manager itself gets initialized. This results in a IllegalStateException to be thrown as the stats field will not yet be initialized when VertexManagerEvents are processed upon the start of the vertex. Note that these events contain partition sizes which are aggregated and stored in this stats field.
 
Apache Pig’s grace auto-parallelism feature uses a custom ShuffleVertexManager which sets a vertex’s parallelism upon the completion of one of its parent’s parents. Thus, this corner case is hit and pig scripts with grace parallelism enabled would fail if the DAG consists of at least one vertex having grandparents.
 
The fix should be straight forward. Before rather than after VertexManagerEvents are processed, simply update pending tasks to ensure the partition stats field will be initialized.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)