You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2015/05/15 03:34:59 UTC

[jira] [Updated] (MESOS-2507) Performance issue in the master when a large number of slaves are registering.

     [ https://issues.apache.org/jira/browse/MESOS-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Mahler updated MESOS-2507:
-----------------------------------
          Sprint: Twitter Q2 Sprint 3 - 5/11
        Assignee: Benjamin Mahler
    Story Points: 5

> Performance issue in the master when a large number of slaves are registering.
> ------------------------------------------------------------------------------
>
>                 Key: MESOS-2507
>                 URL: https://issues.apache.org/jira/browse/MESOS-2507
>             Project: Mesos
>          Issue Type: Improvement
>          Components: master
>            Reporter: Benjamin Mahler
>            Assignee: Benjamin Mahler
>              Labels: scalability, twitter
>
> For large clusters, when a lot of slaves are registering, the master gets backlogged processing registration requests. {{perf}} revealed the following:
> {code}
> Events: 14K cycles
>  25.44%  libmesos-0.22.0-x.so  [.] mesos::internal::master::Master::registerSlave(process::UPID const&, mesos::SlaveInfo const&, std::vector<mesos::Resource, std::allocator<mesos::Resource> > cons
>  11.18%  libmesos-0.22.0-x.so  [.] pipecb
>   5.88%  libc-2.5.so             [.] malloc_consolidate
>   5.33%  libc-2.5.so             [.] _int_free
>   5.25%  libc-2.5.so             [.] malloc
>   5.23%  libc-2.5.so             [.] _int_malloc
>   4.11%  libstdc++.so.6.0.8      [.] std::string::assign(std::string const&)
>   3.22%  libmesos-0.22.0-x.so  [.] mesos::Resource::SharedDtor()
>   3.10%  [kernel]                [k] _raw_spin_lock
>   1.97%  libmesos-0.22.0-x.so  [.] mesos::Attribute::SharedDtor()
>   1.28%  libc-2.5.so             [.] memcmp
>   1.08%  libc-2.5.so             [.] free
> {code}
> This is likely because we loop over all the slaves for each registration:
> {code}
> void Master::registerSlave(
>     const UPID& from,
>     const SlaveInfo& slaveInfo,
>     const vector<Resource>& checkpointedResources,
>     const string& version)
> {
>   // ...
>   // Check if this slave is already registered (because it retries).
>   foreachvalue (Slave* slave, slaves.registered) {
>     if (slave->pid == from) {
>       // ...
>     }
>   }
>   // ...
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)