You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "James Zhao (JIRA)" <ji...@apache.org> on 2012/08/30 02:28:07 UTC

[jira] [Created] (MESOS-265) Master hogs CPU

James Zhao created MESOS-265:
--------------------------------

             Summary: Master hogs CPU
                 Key: MESOS-265
                 URL: https://issues.apache.org/jira/browse/MESOS-265
             Project: Mesos
          Issue Type: Bug
          Components: master
         Environment: * Cluster of 6 CentOS machines; 24 cores / 256G RAM each.
* Large data file (1b records, 100GB) retrieved via HDFS.
* Simple jobs (eg. count records by key, where key takes 20 possible values) using Spark with local modifications.
            Reporter: James Zhao


CPU usage of the master slowly grows until eventually it stops working.

Upon launch, CPU usage is about 1%. After running just 1 job, it climbs to 20% (even after the job has stopped), and upon running more jobs, quickly grows to 50%. It then continues to grow at a slower rate until eventually it stops responding (after perhaps several hundred jobs).

Screenshot: http://math.stanford.edu/~jyzhao/mesos-cpu.png (there are no active jobs, but lt-mesos-master is using 75% CPU)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MESOS-265) Master hogs CPU

Posted by "Benjamin Mahler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MESOS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444568#comment-13444568 ] 

Benjamin Mahler commented on MESOS-265:
---------------------------------------

Hi James, which version of mesos are you running?
                
> Master hogs CPU
> ---------------
>
>                 Key: MESOS-265
>                 URL: https://issues.apache.org/jira/browse/MESOS-265
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>         Environment: * Cluster of 6 CentOS machines; 24 cores / 256G RAM each.
> * Large data file (1b records, 100GB) retrieved via HDFS.
> * Simple jobs (eg. count records by key, where key takes 20 possible values) using Spark with local modifications.
>            Reporter: James Zhao
>
> CPU usage of the master slowly grows until eventually it stops working.
> Upon launch, CPU usage is about 1%. After running just 1 job, it climbs to 20% (even after the job has stopped), and upon running more jobs, quickly grows to 50%. It then continues to grow at a slower rate until eventually it stops responding (after perhaps several hundred jobs).
> Screenshot: http://math.stanford.edu/~jyzhao/mesos-cpu.png (there are no active jobs, but lt-mesos-master is using 75% CPU)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MESOS-265) Master hogs CPU

Posted by "Benjamin Mahler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MESOS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Mahler updated MESOS-265:
----------------------------------

    Attachment: mesos-log.txt.zip
    
> Master hogs CPU
> ---------------
>
>                 Key: MESOS-265
>                 URL: https://issues.apache.org/jira/browse/MESOS-265
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>         Environment: * Cluster of 6 CentOS machines; 24 cores / 256G RAM each.
> * Large data file (1b records, 100GB) retrieved via HDFS.
> * Simple jobs (eg. count records by key, where key takes 20 possible values) using Spark with local modifications.
>            Reporter: James Zhao
>         Attachments: mesos-log.txt.zip
>
>
> CPU usage of the master slowly grows until eventually it stops working.
> Upon launch, CPU usage is about 1%. After running just 1 job, it climbs to 20% (even after the job has stopped), and upon running more jobs, quickly grows to 50%. It then continues to grow at a slower rate until eventually it stops responding (after perhaps several hundred jobs).
> Screenshot: http://math.stanford.edu/~jyzhao/mesos-cpu.png (there are no active jobs, but lt-mesos-master is using 75% CPU)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MESOS-265) Master hogs CPU

Posted by "Benjamin Mahler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MESOS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444578#comment-13444578 ] 

Benjamin Mahler commented on MESOS-265:
---------------------------------------

Hm.. can you attach the master logs?
Look for mesos-master.log (located in your log_dir directory specified on the command line).
                
> Master hogs CPU
> ---------------
>
>                 Key: MESOS-265
>                 URL: https://issues.apache.org/jira/browse/MESOS-265
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>         Environment: * Cluster of 6 CentOS machines; 24 cores / 256G RAM each.
> * Large data file (1b records, 100GB) retrieved via HDFS.
> * Simple jobs (eg. count records by key, where key takes 20 possible values) using Spark with local modifications.
>            Reporter: James Zhao
>
> CPU usage of the master slowly grows until eventually it stops working.
> Upon launch, CPU usage is about 1%. After running just 1 job, it climbs to 20% (even after the job has stopped), and upon running more jobs, quickly grows to 50%. It then continues to grow at a slower rate until eventually it stops responding (after perhaps several hundred jobs).
> Screenshot: http://math.stanford.edu/~jyzhao/mesos-cpu.png (there are no active jobs, but lt-mesos-master is using 75% CPU)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MESOS-265) Master hogs CPU

Posted by "James Zhao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MESOS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444571#comment-13444571 ] 

James Zhao commented on MESOS-265:
----------------------------------

Hi Benjamin.

I'm using the trunk version from last week (previously, I was using the trunk-1-month-ago version, which also had the same issue).

Latest commit message:
commit 93cd1d1cae375a0a0bf71cd2a7dad153a632defe
Author: Benjamin Hindman <be...@apache.org>
Date:   Fri Aug 24 20:53:25 2012 +0000
    Bug fixes for C++11 implementation of then.
    git-svn-id: https://svn.apache.org/repos/asf/incubator/mesos/trunk@1377106 13f79535-47bb-0310-9956-ffa450edef68
                
> Master hogs CPU
> ---------------
>
>                 Key: MESOS-265
>                 URL: https://issues.apache.org/jira/browse/MESOS-265
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>         Environment: * Cluster of 6 CentOS machines; 24 cores / 256G RAM each.
> * Large data file (1b records, 100GB) retrieved via HDFS.
> * Simple jobs (eg. count records by key, where key takes 20 possible values) using Spark with local modifications.
>            Reporter: James Zhao
>
> CPU usage of the master slowly grows until eventually it stops working.
> Upon launch, CPU usage is about 1%. After running just 1 job, it climbs to 20% (even after the job has stopped), and upon running more jobs, quickly grows to 50%. It then continues to grow at a slower rate until eventually it stops responding (after perhaps several hundred jobs).
> Screenshot: http://math.stanford.edu/~jyzhao/mesos-cpu.png (there are no active jobs, but lt-mesos-master is using 75% CPU)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MESOS-265) Master hogs CPU

Posted by "Jessica J (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MESOS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444917#comment-13444917 ] 

Jessica J commented on MESOS-265:
---------------------------------

What does the resource usage on your slave nodes look like?
                
> Master hogs CPU
> ---------------
>
>                 Key: MESOS-265
>                 URL: https://issues.apache.org/jira/browse/MESOS-265
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>         Environment: * Cluster of 6 CentOS machines; 24 cores / 256G RAM each.
> * Large data file (1b records, 100GB) retrieved via HDFS.
> * Simple jobs (eg. count records by key, where key takes 20 possible values) using Spark with local modifications.
>            Reporter: James Zhao
>         Attachments: mesos-log.txt.zip
>
>
> CPU usage of the master slowly grows until eventually it stops working.
> Upon launch, CPU usage is about 1%. After running just 1 job, it climbs to 20% (even after the job has stopped), and upon running more jobs, quickly grows to 50%. It then continues to grow at a slower rate until eventually it stops responding (after perhaps several hundred jobs).
> Screenshot: http://math.stanford.edu/~jyzhao/mesos-cpu.png (there are no active jobs, but lt-mesos-master is using 75% CPU)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MESOS-265) Master hogs CPU

Posted by "James Zhao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MESOS-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444581#comment-13444581 ] 

James Zhao commented on MESOS-265:
----------------------------------

http://math.stanford.edu/~jyzhao/mesos-log.txt
                
> Master hogs CPU
> ---------------
>
>                 Key: MESOS-265
>                 URL: https://issues.apache.org/jira/browse/MESOS-265
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>         Environment: * Cluster of 6 CentOS machines; 24 cores / 256G RAM each.
> * Large data file (1b records, 100GB) retrieved via HDFS.
> * Simple jobs (eg. count records by key, where key takes 20 possible values) using Spark with local modifications.
>            Reporter: James Zhao
>
> CPU usage of the master slowly grows until eventually it stops working.
> Upon launch, CPU usage is about 1%. After running just 1 job, it climbs to 20% (even after the job has stopped), and upon running more jobs, quickly grows to 50%. It then continues to grow at a slower rate until eventually it stops responding (after perhaps several hundred jobs).
> Screenshot: http://math.stanford.edu/~jyzhao/mesos-cpu.png (there are no active jobs, but lt-mesos-master is using 75% CPU)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira