You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Jeff Zhang <zj...@gmail.com> on 2015/08/11 11:25:24 UTC

Is OutputCommitCoordinator necessary for all the stages ?

As my understanding, OutputCommitCoordinator should only be necessary for
ResultStage (especially for ResultStage with hdfs write), but currently it
is used for all the stages. Is there any reason for that ?

-- 
Best Regards

Jeff Zhang

Re: Is OutputCommitCoordinator necessary for all the stages ?

Posted by Jeff Zhang <zj...@gmail.com>.

Hi Josh,

I mean on the driver side. OutputCommitCorrdinator.startStage is called in
DAGScheduler#submitMissingTasks for all the stages (cost some memory).
Although it is fine that as long as executor side don't call RPC, there's
no much performance penalty.

On Wed, Aug 12, 2015 at 12:17 AM, Josh Rosen <ro...@gmail.com> wrote:

> Can you clarify what you mean by "used for all stages"?
> OutputCommitCoordinator RPCs should only be initiated through
> SparkHadoopMapRedUtil.commitTask(), so while the OutputCommitCoordinator
> doesn't make a distinction between ShuffleMapStages and ResultStages there
> still should not be a performance penalty for this because the extra rounds
> of RPCs should only be performed when necessary.
>
>
> On 8/11/15 2:25 AM, Jeff Zhang wrote:
>
>> As my understanding, OutputCommitCoordinator should only be necessary for
>> ResultStage (especially for ResultStage with hdfs write), but currently it
>> is used for all the stages. Is there any reason for that ?
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>


-- 
Best Regards

Jeff Zhang

Re: Is OutputCommitCoordinator necessary for all the stages ?

Posted by Josh Rosen <ro...@gmail.com>.

Can you clarify what you mean by "used for all stages"? 
OutputCommitCoordinator RPCs should only be initiated through 
SparkHadoopMapRedUtil.commitTask(), so while the OutputCommitCoordinator 
doesn't make a distinction between ShuffleMapStages and ResultStages 
there still should not be a performance penalty for this because the 
extra rounds of RPCs should only be performed when necessary.

On 8/11/15 2:25 AM, Jeff Zhang wrote:
> As my understanding, OutputCommitCoordinator should only be necessary 
> for ResultStage (especially for ResultStage with hdfs write), but 
> currently it is used for all the stages. Is there any reason for that ?
>
> -- 
> Best Regards
>
> Jeff Zhang

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org