You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2015/06/09 17:59:00 UTC

[jira] [Commented] (TEZ-2544) Incorrect dag result due to wrong TaskSpec in recovering

    [ https://issues.apache.org/jira/browse/TEZ-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579138#comment-14579138 ] 

Hitesh Shah commented on TEZ-2544:
----------------------------------

Bumping to blocker as this affects data. 

> Incorrect dag result due to wrong TaskSpec in recovering
> --------------------------------------------------------
>
>                 Key: TEZ-2544
>                 URL: https://issues.apache.org/jira/browse/TEZ-2544
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>            Priority: Blocker
>              Labels: Recovery
>
> Expected TaskSpec
> {noformat}
> DAGName : OrderedWordCount, VertexName: Summation, VertexParallelism: 1, TaskAttemptID:attempt_1433850314856_0019_1_01_000000_0, processorName=org.apache.tez.examples.OrderedWordCount$SumProcessor, inputSpecListSize=1, 
> outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Tokenizer, physicalEdgeCount=2, inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }}, ], outputSpecList=[{{ destinationVertexName=Sorter, physicalEdgeCount=1, outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput }}
> {noformat}
> The actual TaskSpec
> {noformat}
> DAGName : OrderedWordCount, VertexName: Summation, VertexParallelism: 1, TaskAttemptID:attempt_1433850314856_0019_1_01_000000_0, processorName=org.apache.tez.examples.OrderedWordCount$SumProcessor, inputSpecListSize=1, 
> outputSpecListSize=1, inputSpecList=[{{ sourceVertexName=Tokenizer, physicalEdgeCount=1, inputClassName=org.apache.tez.runtime.library.input.OrderedGroupedKVInput }}, ], outputSpecList=[{{ destinationVertexName=Sorter, physicalEdgeCount=1, outputClassName=org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput }}
> {noformat}
> The expected physicalEdgeCount is 2 but actually it is 1, it happens when dynamic parallelism estimation is enabled. 
> The cause is that Task is recovering but its vertex's source edge manager has not been updated from ScatterGatherEdgeManager to CustomShuffleEdgeManager, so will result in different physicalEdgeCount for InputSpec



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)