You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mesos.apache.org by "Michael Park (JIRA)" <ji...@apache.org> on 2017/05/02 22:52:04 UTC

[jira] [Commented] (MESOS-7123) Investigate splitting offer messages instead of sending a giant single resource offer message.

    [ https://issues.apache.org/jira/browse/MESOS-7123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993939#comment-15993939 ] 

Michael Park commented on MESOS-7123:
-------------------------------------

[~anandmazumdar]: Pushing this off to target 1.4.0. Please let me know if this is a blocker for 1.3.0.

> Investigate splitting offer messages instead of sending a giant single resource offer message.
> ----------------------------------------------------------------------------------------------
>
>                 Key: MESOS-7123
>                 URL: https://issues.apache.org/jira/browse/MESOS-7123
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Anand Mazumdar
>            Priority: Critical
>              Labels: mesosphere
>
> Currently, the Mesos master batches all the resource offers into a single message and then sends it to the scheduler. However, for large clusters this can be problematic as this message can exceed the maximum allowed default protobuf message size (~64mb). When such a message reaches the scheduler, it's dropped with a warning followed by a failed invariant check.
> {noformat}
> [libprotobuf ERROR google/protobuf/io/coded_stream.cc:180] A protocol message was rejected because it was too big (more than 67108864 bytes).  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stre
> am.h.
> F0213 21:33:57.658892 60996 sched.cpp:895] Check failed: offers.size() == pids.size() (32664 vs. 0)
> *** Check failure stack trace: ***
>     @     0x7f8d1b4d69bd  (unknown)
>     @     0x7f8d1b4d8750  (unknown)
>     @     0x7f8d1b4d6582  (unknown)
>     @     0x7f8d1b4d90e9  (unknown)
>     @     0x7f8d1aaa646c  (unknown)
>     @     0x7f8d1aaa7df7  (unknown)
>     @     0x7f8d1aa8ee4a  (unknown)
>     @     0x7f8d1aa9d109  (unknown)
>     @     0x7f8d1b46e4e4  (unknown)
>     @     0x7f8d1b46e827  (unknown)
>     @     0x7f8e319b0220  (unknown)
>     @     0x7f8e3355ddc5  start_thread
>     @     0x7f8e32c62ced  __clone
>     @              (nil)  (unknown)
> {noformat}
> Possible solutions can be to either batch the offers e.g., 100 offers per message or have a N:1 mapping ie., 1 offer per message by the Mesos master. The batch size can be set via a master flag at startup with a reasonable default value.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)