You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Chengwei Yang (JIRA)" <ji...@apache.org> on 2014/08/29 09:11:53 UTC

[jira] [Created] (MESOS-1746) clear TaskStatus data to avoid OOM

Chengwei Yang created MESOS-1746:
------------------------------------

             Summary: clear TaskStatus data to avoid OOM
                 Key: MESOS-1746
                 URL: https://issues.apache.org/jira/browse/MESOS-1746
             Project: Mesos
          Issue Type: Bug
         Environment: mesos-0.19.0
            Reporter: Chengwei Yang
            Assignee: Chengwei Yang


Spark on mesos may use TaskStatus to transfer computed result between worker and scheduler, the source code like below (spark 1.0.2)

{code}
        val serializedResult = {
          if (serializedDirectResult.limit >= execBackend.akkaFrameSize() -
              AkkaUtils.reservedSizeBytes) {                                                                                                                                                                                                                                      
            logInfo("Storing result for " + taskId + " in local BlockManager")
            val blockId = TaskResultBlockId(taskId)
            env.blockManager.putBytes(
              blockId, serializedDirectResult, StorageLevel.MEMORY_AND_DISK_SER)
            ser.serialize(new IndirectTaskResult[Any](blockId))                                                                                                                                                                                                                   
          } else {                                                                                                                                                                                                                                                                
            logInfo("Sending result for " + taskId + " directly to driver")
            serializedDirectResult                                                                                                                                                                                                                                                
          }                                                                                                                                                                                                                                                                       
        }    
{code}

And In our test environment, we enlarge akkaFrameSize to 128MB from default value (10MB) and this cause our mesos-master process will be OOM in tens of minutes when running spark tasks in fine-grained mode.

As you can see, even changed akkaFrameSize back to default value (10MB), it's very likely to make mesos-master OOM too, however more slower.

So I think it's good to delete data from TaskStatus since this is only designed to on-top framework and we don't interested in it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)