You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tao Li (JIRA)" <ji...@apache.org> on 2015/04/07 10:50:12 UTC

[jira] [Updated] (SPARK-6737) OutputCommitCoordinator.authorizedCommittersByStage map out of memory

     [ https://issues.apache.org/jira/browse/SPARK-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tao Li updated SPARK-6737:
--------------------------
    Description: 
I am using spark streaming(1.3.1)  as a long time running service and out of memory after running for 7 weeks. 

I found that the field authorizedCommittersByStage in OutputCommitCoordinator class cause the OOM. 
authorizedCommittersByStage is a map, key is StageId, value is Map[PartitionId, TaskAttemptId]. The OutputCommitCoordinator class has a method stageEnd which will remove stageId from authorizedCommittersByStage. But the method stageEnd is never called by DAGSchedule. And it cause the authorizedCommittersByStage's stage info never be cleaned, which cause OOM.

It happens in my spark streaming program(1.3.1), I am not sure if it will appear in other spark components and other spark version.

  was:
I am using spark streaming(1.3.1)  as a long time running service and out of memory after running for 7 weeks. I found that the field authorizedCommittersByStage in OutputCommitCoordinator class cause the OOM. 
authorizedCommittersByStage is a map, key is StageId, value is Map[PartitionId, TaskAttemptId]. The OutputCommitCoordinator class has a method stageEnd which will remove stageId from authorizedCommittersByStage. But the method stageEnd is never called by DAGSchedule. And it cause the authorizedCommittersByStage's stage info never be cleaned, which cause OOM.

It happens in my spark streaming program(1.3.1), I am not sure if it will appear in other spark components and other spark version.


> OutputCommitCoordinator.authorizedCommittersByStage map out of memory
> ---------------------------------------------------------------------
>
>                 Key: SPARK-6737
>                 URL: https://issues.apache.org/jira/browse/SPARK-6737
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Streaming
>    Affects Versions: 1.3.0
>         Environment: spark 1.3.1
>            Reporter: Tao Li
>            Priority: Critical
>              Labels: Bug, Core, DAGScheduler, OOM, Streaming
>
> I am using spark streaming(1.3.1)  as a long time running service and out of memory after running for 7 weeks. 
> I found that the field authorizedCommittersByStage in OutputCommitCoordinator class cause the OOM. 
> authorizedCommittersByStage is a map, key is StageId, value is Map[PartitionId, TaskAttemptId]. The OutputCommitCoordinator class has a method stageEnd which will remove stageId from authorizedCommittersByStage. But the method stageEnd is never called by DAGSchedule. And it cause the authorizedCommittersByStage's stage info never be cleaned, which cause OOM.
> It happens in my spark streaming program(1.3.1), I am not sure if it will appear in other spark components and other spark version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org