You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by "Nastooh Avessta (navesta)" <na...@cisco.com> on 2015/07/18 00:46:53 UTC

Mesos Slave Failover time

Hi
Trying to adjust the current failover time to below 10 seconds and don't seem to be able to find the right set of parameters. Currently, it takes around minute and half for master to detect that a slave has gone offline, which seems to correspond to slave_ping_timeout=15*max_slave_ping_timeouts=5. However, I can't find these parameters in mesos-master:

# mesos-master --version
mesos 0.22.1
#mesos-master --help
Usage: mesos-master [...]

Supported options:
  --acls=VALUE                             The value could be a JSON formatted string of ACLs
                                           or a file path containing the JSON formatted ACLs used
                                           for authorization. Path could be of the form 'file:///path/to/file'
                                           or '/path/to/file'.

                                           See the ACLs protobuf in mesos.proto for the expected format.

                                           Example:
                                           {
                                             "register_frameworks": [
                                                                  {
                                                                     "principals": { "type": "ANY" },
                                                                     "roles": { "values": ["a"] }
                                                                  }
                                                                ],
                                             "run_tasks": [
                                                             {
                                                                "principals": { "values": ["a", "b"] },
                                                                "users": { "values": ["c"] }
                                                             }
                                                           ],
                                             "shutdown_frameworks": [
                                                           {
                                                              "principals": { "values": ["a", "b"] },
                                                              "framework_principals": { "values": ["c"] }
                                                           }
                                                         ]
                                           }
  --allocation_interval=VALUE              Amount of time to wait between performing
                                            (batch) allocations (e.g., 500ms, 1sec, etc). (default: 1secs)
  --[no-]authenticate                      If authenticate is 'true' only authenticated frameworks are allowed
                                           to register. If 'false' unauthenticated frameworks are also
                                           allowed to register. (default: false)
  --[no-]authenticate_slaves               If 'true' only authenticated slaves are allowed to register.
                                           If 'false' unauthenticated slaves are also allowed to register. (default: false)
  --authenticators=VALUE                   Authenticator implementation to use when authenticating frameworks
                                           and/or slaves. Use the default 'crammd5', or
                                           load an alternate authenticator module using --modules. (default: crammd5)
  --cluster=VALUE                          Human readable name for the cluster,
                                           displayed in the webui.
  --credentials=VALUE                      Either a path to a text file with a list of credentials,
                                           each line containing 'principal' and 'secret' separated by whitespace,
                                           or, a path to a JSON-formatted file containing credentials.
                                           Path could be of the form 'file:///path/to/file' or '/path/to/file'.
                                           JSON file Example:
                                           {
                                             "credentials": [
                                                               {
                                                                  "principal": "sherman",
                                                                  "secret": "kitesurf",
                                                               }
                                                              ]
                                           }
                                           Text file Example:
                                           username secret

  --external_log_file=VALUE                Specified the externally managed log file. This file will be
                                           exposed in the webui and HTTP api. This is useful when using
                                           stderr logging as the log file is otherwise unknown to Mesos.
  --framework_sorter=VALUE                 Policy to use for allocating resources
                                           between a given user's frameworks. Options
                                           are the same as for user_allocator. (default: drf)
  --[no-]help                              Prints this help message (default: false)
  --hooks=VALUE                            A comma separated list of hook modules to be
                                           installed inside master.
  --hostname=VALUE                         The hostname the master should advertise in ZooKeeper.
                                           If left unset, the hostname is resolved from the IP address
                                           that the master binds to.
  --[no-]initialize_driver_logging         Whether to automatically initialize google logging of scheduler
                                           and/or executor drivers. (default: true)
  --ip=VALUE                               IP address to listen on
  --[no-]log_auto_initialize               Whether to automatically initialize the replicated log used for the
                                           registry. If this is set to false, the log has to be manually
                                           initialized when used for the very first time. (default: true)
  --log_dir=VALUE                          Directory path to put log files (no default, nothing
                                           is written to disk unless specified;
                                           does not affect logging to stderr).
                                           NOTE: 3rd party log messages (e.g. ZooKeeper) are
                                           only written to stderr!

  --logbufsecs=VALUE                       How many seconds to buffer log messages for (default: 0)
  --logging_level=VALUE                    Log message at or above this level; possible values:
                                           'INFO', 'WARNING', 'ERROR'; if quiet flag is used, this
                                           will affect just the logs from log_dir (if specified) (default: INFO)
  --modules=VALUE                          List of modules to be loaded and be available to the internal
                                           subsystems.

                                           Use --modules=filepath to specify the list of modules via a
                                           file containing a JSON formatted string. 'filepath' can be
                                           of the form 'file:///path/to/file' or '/path/to/file'.

                                           Use --modules="{...}" to specify the list of modules inline.

                                           Example:
                                           {
                                             "libraries": [
                                               {
                                                 "file": "/path/to/libfoo.so",
                                                 "modules": [
                                                   {
                                                     "name": "org_apache_mesos_bar",
                                                     "parameters": [
                                                       {
                                                         "key": "X",
                                                         "value": "Y"
                                                       }
                                                     ]
                                                   },
                                                   {
                                                     "name": "org_apache_mesos_baz"
                                                   }
                                                 ]
                                               },
                                               {
                                                 "name": "qux",
                                                 "modules": [
                                                   {
                                                     "name": "org_apache_mesos_norf"
                                                   }
                                                 ]
                                               }
                                             ]
                                           }
  --offer_timeout=VALUE                    Duration of time before an offer is rescinded from a framework.
                                           This helps fairness when running frameworks that hold on to offers,
                                           or frameworks that accidentally drop offers.
  --port=VALUE                             Port to listen on (default: 5050)
  --[no-]quiet                             Disable logging to stderr (default: false)
  --quorum=VALUE                           The size of the quorum of replicas when using 'replicated_log' based
                                           registry. It is imperative to set this value to be a majority of
                                           masters i.e., quorum > (number of masters)/2.
  --rate_limits=VALUE                      The value could be a JSON formatted string of rate limits
                                           or a file path containing the JSON formatted rate limits used
                                           for framework rate limiting.
                                           Path could be of the form 'file:///path/to/file'
                                           or '/path/to/file'.

                                           See the RateLimits protobuf in mesos.proto for the expected format.

                                           Example:
                                           {
                                             "limits": [
                                               {
                                                 "principal": "foo",
                                                 "qps": 55.5
                                               },
                                               {
                                                 "principal": "bar"
                                               }
                                             ],
                                             "aggregate_default_qps": 33.3
                                           }
  --recovery_slave_removal_limit=VALUE     For failovers, limit on the percentage of slaves that can be removed
                                           from the registry *and* shutdown after the re-registration timeout
                                           elapses. If the limit is exceeded, the master will fail over rather
                                           than remove the slaves.
                                           This can be used to provide safety guarantees for production
                                           environments. Production environments may expect that across Master
                                           failovers, at most a certain percentage of slaves will fail
                                           permanently (e.g. due to rack-level failures).
                                           Setting this limit would ensure that a human needs to get
                                           involved should an unexpected widespread failure of slaves occur
                                           in the cluster.
                                           Values: [0%-100%] (default: 100%)
  --registry=VALUE                         Persistence strategy for the registry;
                                           available options are 'replicated_log', 'in_memory' (for testing). (default: replicated_log)
  --registry_fetch_timeout=VALUE           Duration of time to wait in order to fetch data from the registry
                                           after which the operation is considered a failure. (default: 1mins)
  --registry_store_timeout=VALUE           Duration of time to wait in order to store data in the registry
                                           after which the operation is considered a failure. (default: 5secs)
  --[no-]registry_strict                   Whether the Master will take actions based on the persistent
                                           information stored in the Registry. Setting this to false means
                                           that the Registrar will never reject the admission, readmission,
                                           or removal of a slave. Consequently, 'false' can be used to
                                           bootstrap the persistent state on a running cluster.
                                           NOTE: This flag is *experimental* and should not be used in
                                           production yet. (default: false)
  --roles=VALUE                            A comma separated list of the allocation
                                           roles that frameworks in this cluster may
                                           belong to.
  --[no-]root_submissions                  Can root submit frameworks? (default: true)
  --slave_removal_rate_limit=VALUE         The maximum rate (e.g., 1/10mins, 2/3hrs, etc) at which slaves will
                                           be removed from the master when they fail health checks. By default
                                           slaves will be removed as soon as they fail the health checks.
                                           The value is of the form <Number of slaves>/<Duration>.
  --slave_reregister_timeout=VALUE         The timeout within which all slaves are expected to re-register
                                           when a new master is elected as the leader. Slaves that do not
                                           re-register within the timeout will be removed from the registry
                                           and will be shutdown if they attempt to communicate with master.
                                           NOTE: This value has to be atleast 10mins. (default: 10mins)
  --user_sorter=VALUE                      Policy to use for allocating resources
                                           between users. May be one of:
                                             dominant_resource_fairness (drf) (default: drf)
  --[no-]version                           Show version and exit. (default: false)
  --webui_dir=VALUE                        Directory path of the webui files/assets (default: /usr/share/mesos/webui)
  --weights=VALUE                          A comma separated list of role/weight pairs
                                           of the form 'role=weight,role=weight'. Weights
                                           are used to indicate forms of priority.
  --whitelist=VALUE                        Path to a file with a list of slaves
                                           (one per line) to advertise offers for.
                                           Path could be of the form 'file:///path/to/file' or '/path/to/file'.
  --work_dir=VALUE                         Directory path to store the persistent information stored in the
                                           Registry. (example: /var/lib/mesos/master)
  --zk=VALUE                               ZooKeeper URL (used for leader election amongst masters)
                                           May be one of:
                                             zk://host1:port1,host2:port2,.../path
                                             zk://username:password@host1:port1,host2:port2,.../path
                                             file:///path/to/file (where file contains one of the above)
  --zk_session_timeout=VALUE               ZooKeeper session timeout. (default: 10secs)

Furthermore, setting these parameter either in /etc/mesos-master/ or inline generates the following error:
# /usr/sbin/mesos-master --zk=zk://10.40.50.228:2181/mesos --port=5050 --log_dir=/var/log/mesos --hostname=10.40.50.228 --ip=10.40.50.228 --quorum=1 --work
_dir=/var/lib/mesos --max_slave_ping_timeouts=2
Failed to load unknown flag 'max_slave_ping_timeouts'
Usage: mesos-master [...]

Supported options:
  --acls=VALUE                             The valu
...

Any thoughts?
Cheers,
[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Nastooh Avessta
ENGINEER.SOFTWARE ENGINEERING
navesta@cisco.com
Phone: +1 604 647 1527

Cisco Systems Limited
595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121
VANCOUVER
BRITISH COLUMBIA
V7X 1J1
CA
Cisco.com<http://www.cisco.com/>





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html

Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J 2T3. Phone: 416-306-7000; Fax: 416-306-7099. Preferences<http://www.cisco.com/offer/subscribe/?sid=000478326> - Unsubscribe<http://www.cisco.com/offer/unsubscribe/?sid=000478327> - Privacy<http://www.cisco.com/web/siteassets/legal/privacy.html>


Re: Mesos Slave Failover time

Posted by Adam Bordelon <ad...@mesosphere.io>.
Nastoo, the only other option right now is to recompile Mesos with those
hardcoded constants changed to your desired value. Painful, but that's why
we wanted to turn them into flags.
https://github.com/apache/mesos/blob/0.22.1/src/master/constants.cpp#L34

On Fri, Jul 17, 2015 at 4:15 PM, Nastooh Avessta (navesta) <
navesta@cisco.com> wrote:

>  Thank you for your prompt reply. Any other method that could decrease
> failover time, in the meanwhile?
>
> Cheers,
>
>
>
> [image: http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]
>
> *Nastooh Avessta*
> ENGINEER.SOFTWARE ENGINEERING
> navesta@cisco.com
> Phone: *+1 604 647 1527 <%2B1%20604%20647%201527>*
>
> *Cisco Systems Limited*
> 595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121
> VANCOUVER
> BRITISH COLUMBIA
> V7X 1J1
> CA
> Cisco.com <http://www.cisco.com/>
>
>
>
> [image: Think before you print.]Think before you print.
>
> This email may contain confidential and privileged material for the sole
> use of the intended recipient. Any review, use, distribution or disclosure
> by others is strictly prohibited. If you are not the intended recipient (or
> authorized to receive for the recipient), please contact the sender by
> reply email and delete all copies of this message.
>
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/index.html
>
> Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J
> 2T3. Phone: 416-306-7000; Fax: 416-306-7099. *Preferences
> <http://www.cisco.com/offer/subscribe/?sid=000478326> - Unsubscribe
> <http://www.cisco.com/offer/unsubscribe/?sid=000478327> – Privacy
> <http://www.cisco.com/web/siteassets/legal/privacy.html>*
>
>
>
> *From:* Vinod Kone [mailto:vinodkone@gmail.com]
> *Sent:* Friday, July 17, 2015 4:07 PM
> *To:* user@mesos.apache.org
> *Subject:* Re: Mesos Slave Failover time
>
>
>
> It's not configurable yet, but will be in the upcoming 0.23.0 release.
>
>
>
> On Fri, Jul 17, 2015 at 3:46 PM, Nastooh Avessta (navesta) <
> navesta@cisco.com> wrote:
>
> Hi
>
> Trying to adjust the current failover time to below 10 seconds and don’t
> seem to be able to find the right set of parameters. Currently, it takes
> around minute and half for master to detect that a slave has gone offline,
> which seems to correspond to
> slave_ping_timeout=15*max_slave_ping_timeouts=5. However, I can’t find
> these parameters in mesos-master:
>
>
>
> # mesos-master --version
>
> mesos 0.22.1
>
> #mesos-master --help
>
> Usage: mesos-master [...]
>
>
>
> Supported options:
>
>   --acls=VALUE                             The value could be a JSON
> formatted string of ACLs
>
>                                            or a file path containing the
> JSON formatted ACLs used
>
>                                            for authorization. Path could
> be of the form 'file:///path/to/file'
>
>                                            or '/path/to/file'.
>
>
>
>                                            See the ACLs protobuf in
> mesos.proto for the expected format.
>
>
>
>                                            Example:
>
>                                            {
>
>                                              "register_frameworks": [
>
>                                                                   {
>
>
> "principals": { "type": "ANY" },
>
>
> "roles": { "values": ["a"] }
>
>                                                                   }
>
>                                                                 ],
>
>                                              "run_tasks": [
>
>                                                              {
>
>
>                                                        "principals": {
> "values": ["a", "b"] },
>
>                                                                 "users": {
> "values": ["c"] }
>
>                                                              }
>
>                                                            ],
>
>                                              "shutdown_frameworks": [
>
>                                                            {
>
>
> "principals": { "values": ["a", "b"] },
>
>
> "framework_principals": { "values": ["c"] }
>
>                                                            }
>
>                                                          ]
>
>                                            }
>
>   --allocation_interval=VALUE              Amount of time to wait between
> performing
>
>                                             (batch) allocations (e.g.,
> 500ms, 1sec, etc). (default: 1secs)
>
>   --[no-]authenticate                      If authenticate is 'true' only
> authenticated frameworks are allowed
>
>                                            to register. If 'false'
> unauthenticated frameworks are also
>
>                                            allowed to register. (default:
> false)
>
>   --[no-]authenticate_slaves               If 'true' only authenticated
> slaves are allowed to register.
>
>                                            If 'false' unauthenticated
> slaves are also allowed to register. (default: false)
>
>   --authenticators=VALUE                   Authenticator implementation to
> use when authenticating frameworks
>
>                                            and/or slaves. Use the default
> 'crammd5', or
>
>                                            load an alternate authenticator
> module using --modules. (default: crammd5)
>
>   --cluster=VALUE                          Human readable name for the
> cluster,
>
>                                            displayed in the webui.
>
>   --credentials=VALUE                      Either a path to a text file
> with a list of credentials,
>
>                                            each line containing
> 'principal' and 'secret' separated by whitespace,
>
>                                            or, a path to a JSON-formatted
> file containing credentials.
>
>                                            Path could be of the form
> 'file:///path/to/file' or '/path/to/file'.
>
>                                            JSON file Example:
>
>                                            {
>
>                                              "credentials": [
>
>                                                                {
>
>
> "principal": "sherman",
>
>
>              "secret": "kitesurf",
>
>                                                                }
>
>                                                               ]
>
>                                            }
>
>                                            Text file Example:
>
>                                            username secret
>
>
>
>   --external_log_file=VALUE                Specified the externally
> managed log file. This file will be
>
>                                            exposed in the webui and HTTP
> api. This is useful when using
>
>                                            stderr logging as the log file
> is otherwise unknown to Mesos.
>
>   --framework_sorter=VALUE                 Policy to use for allocating
> resources
>
>                                            between a given user's
> frameworks. Options
>
>                                            are the same as for
> user_allocator. (default: drf)
>
>   --[no-]help                              Prints this help message
> (default: false)
>
>   --hooks=VALUE                            A comma separated list of hook
> modules to be
>
>                                            installed inside master.
>
>   --hostname=VALUE                         The hostname the master should
> advertise in ZooKeeper.
>
>                                            If left unset, the hostname is
> resolved from the IP address
>
>                                            that the master binds to.
>
>   --[no-]initialize_driver_logging         Whether to automatically
> initialize google logging of scheduler
>
>                                            and/or executor drivers.
> (default: true)
>
>   --ip=VALUE                               IP address to listen on
>
>   --[no-]log_auto_initialize               Whether to automatically
> initialize the replicated log used for the
>
>                                            registry. If this is set to
> false, the log has to be manually
>
>                                            initialized when used for the
> very first time. (default: true)
>
>   --log_dir=VALUE                          Directory path to put log files
> (no default, nothing
>
>                                            is written to disk unless
> specified;
>
>                                            does not affect logging to
> stderr).
>
>                                            NOTE: 3rd party log messages
> (e.g. ZooKeeper) are
>
>                                            only written to stderr!
>
>
>
>   --logbufsecs=VALUE                       How many seconds to buffer log
> messages for (default: 0)
>
>   --logging_level=VALUE                    Log message at or above this
> level; possible values:
>
>                                            'INFO', 'WARNING', 'ERROR'; if
> quiet flag is used, this
>
>                                            will affect just the logs from
> log_dir (if specified) (default: INFO)
>
>   --modules=VALUE                          List of modules to be loaded
> and be available to the internal
>
>                                            subsystems.
>
>
>
>                                            Use --modules=filepath to
> specify the list of modules via a
>
>                                            file containing a JSON
> formatted string. 'filepath' can be
>
>                                            of the form
> 'file:///path/to/file' or '/path/to/file'.
>
>
>
>                                            Use --modules="{...}" to
> specify the list of modules inline.
>
>
>
>                                            Example:
>
>                                            {
>
>                                              "libraries": [
>
>                                                {
>
>                                                  "file":
> "/path/to/libfoo.so",
>
>                                                  "modules": [
>
>                                                    {
>
>                                                      "name":
> "org_apache_mesos_bar",
>
>                                                      "parameters": [
>
>                                                        {
>
>                                                          "key": "X",
>
>                                                          "value": "Y"
>
>                                                        }
>
>                                                      ]
>
>                                                    },
>
>                                                    {
>
>                                                      "name":
> "org_apache_mesos_baz"
>
>                                                    }
>
>                                                  ]
>
>                                                },
>
>                                                {
>
>                                                  "name": "qux",
>
>                                                  "modules": [
>
>                                                    {
>
>                                                      "name":
> "org_apache_mesos_norf"
>
>                                                    }
>
>                                                  ]
>
>                                                }
>
>                                              ]
>
>                                            }
>
>   --offer_timeout=VALUE                    Duration of time before an
> offer is rescinded from a framework.
>
>                                            This helps fairness when
> running frameworks that hold on to offers,
>
>                                            or frameworks that accidentally
> drop offers.
>
>   --port=VALUE                             Port to listen on (default:
> 5050)
>
>   --[no-]quiet                             Disable logging to stderr
> (default: false)
>
>   --quorum=VALUE                           The size of the quorum of
> replicas when using 'replicated_log' based
>
>                                            registry. It is imperative to
> set this value to be a majority of
>
>                                            masters i.e., quorum > (number
> of masters)/2.
>
>   --rate_limits=VALUE                      The value could be a JSON
> formatted string of rate limits
>
>                                            or a file path containing the
> JSON formatted rate limits used
>
>                                            for framework rate limiting.
>
>                                            Path could be of the form
> 'file:///path/to/file'
>
>                                            or '/path/to/file'.
>
>
>
>                                            See the RateLimits protobuf in
> mesos.proto for the expected format.
>
>
>
>                                            Example:
>
>                                            {
>
>                                              "limits": [
>
>                                                {
>
>                                                  "principal": "foo",
>
>                                                  "qps": 55.5
>
>                                                },
>
>                                                {
>
>                                                  "principal": "bar"
>
>                                                }
>
>                                              ],
>
>                                              "aggregate_default_qps": 33.3
>
>                                            }
>
>   --recovery_slave_removal_limit=VALUE     For failovers, limit on the
> percentage of slaves that can be removed
>
>                                            from the registry *and*
> shutdown after the re-registration timeout
>
>                                            elapses. If the limit is
> exceeded, the master will fail over rather
>
>                                            than remove the slaves.
>
>                                            This can be used to provide
> safety guarantees for production
>
>                                            environments. Production
> environments may expect that across Master
>
>                                            failovers, at most a certain
> percentage of slaves will fail
>
>                                            permanently (e.g. due to
> rack-level failures).
>
>                                            Setting this limit would ensure
> that a human needs to get
>
>                                            involved should an unexpected
> widespread failure of slaves occur
>
>                                            in the cluster.
>
>                                            Values: [0%-100%] (default:
> 100%)
>
>   --registry=VALUE                         Persistence strategy for the
> registry;
>
>                                            available options are
> 'replicated_log', 'in_memory' (for testing). (default: replicated_log)
>
>   --registry_fetch_timeout=VALUE           Duration of time to wait in
> order to fetch data from the registry
>
>                                            after which the operation is
> considered a failure. (default: 1mins)
>
>   --registry_store_timeout=VALUE           Duration of time to wait in
> order to store data in the registry
>
>                                            after which the operation is
> considered a failure. (default: 5secs)
>
>   --[no-]registry_strict                   Whether the Master will take
> actions based on the persistent
>
>                                            information stored in the
> Registry. Setting this to false means
>
>                                            that the Registrar will never
> reject the admission, readmission,
>
>                                            or removal of a slave.
> Consequently, 'false' can be used to
>
>                                            bootstrap the persistent state
> on a running cluster.
>
>                                            NOTE: This flag is
> *experimental* and should not be used in
>
>                                            production yet. (default: false)
>
>   --roles=VALUE                            A comma separated list of the
> allocation
>
>                                            roles that frameworks in this
> cluster may
>
>                                            belong to.
>
>   --[no-]root_submissions                  Can root submit frameworks?
> (default: true)
>
>   --slave_removal_rate_limit=VALUE         The maximum rate (e.g.,
> 1/10mins, 2/3hrs, etc) at which slaves will
>
>                                            be removed from the master when
> they fail health checks. By default
>
>                                            slaves will be removed as soon
> as they fail the health checks.
>
>                                            The value is of the form
> <Number of slaves>/<Duration>.
>
>   --slave_reregister_timeout=VALUE         The timeout within which all
> slaves are expected to re-register
>
>                                            when a new master is elected as
> the leader. Slaves that do not
>
>                                            re-register within the timeout
> will be removed from the registry
>
>                                            and will be shutdown if they
> attempt to communicate with master.
>
>                                            NOTE: This value has to be
> atleast 10mins. (default: 10mins)
>
>   --user_sorter=VALUE                      Policy to use for allocating
> resources
>
>                                            between users. May be one of:
>
>                                              dominant_resource_fairness
> (drf) (default: drf)
>
>   --[no-]version                           Show version and exit.
> (default: false)
>
>   --webui_dir=VALUE                        Directory path of the webui
> files/assets (default: /usr/share/mesos/webui)
>
>   --weights=VALUE                          A comma separated list of
> role/weight pairs
>
>                                            of the form
> 'role=weight,role=weight'. Weights
>
>                                            are used to indicate forms of
> priority.
>
>   --whitelist=VALUE                        Path to a file with a list of
> slaves
>
>                                            (one per line) to advertise
> offers for.
>
>                                            Path could be of the form
> 'file:///path/to/file' or '/path/to/file'.
>
>   --work_dir=VALUE                         Directory path to store the
> persistent information stored in the
>
>                                            Registry. (example:
> /var/lib/mesos/master)
>
>   --zk=VALUE                               ZooKeeper URL (used for leader
> election amongst masters)
>
>                                            May be one of:
>
>
> zk://host1:port1,host2:port2,.../path
>
>                                              zk://username:password@host1
> :port1,host2:port2,.../path
>
>                                              file:///path/to/file (where
> file contains one of the above)
>
>   --zk_session_timeout=VALUE               ZooKeeper session timeout.
> (default: 10secs)
>
>
>
> Furthermore, setting these parameter either in /etc/mesos-master/ or
> inline generates the following error:
>
> # /usr/sbin/mesos-master --zk=zk://10.40.50.228:2181/mesos --port=5050
> --log_dir=/var/log/mesos --hostname=10.40.50.228 --ip=10.40.50.228
> --quorum=1 --work
>
> _dir=/var/lib/mesos --max_slave_ping_timeouts=2
>
> Failed to load unknown flag 'max_slave_ping_timeouts'
>
> Usage: mesos-master [...]
>
>
>
> Supported options:
>
>   --acls=VALUE                             The valu
>
> …
>
>
>
> Any thoughts?
>
> Cheers,
>
> [image: http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]
>
> *Nastooh Avessta*
> ENGINEER.SOFTWARE ENGINEERING
> navesta@cisco.com
> Phone: *+1 604 647 1527 <%2B1%20604%20647%201527>*
>
> *Cisco Systems Limited*
> 595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121
> VANCOUVER
> BRITISH COLUMBIA
> V7X 1J1
> CA
> Cisco.com <http://www.cisco.com/>
>
>
>
> [image: Think before you print.]Think before you print.
>
> This email may contain confidential and privileged material for the sole
> use of the intended recipient. Any review, use, distribution or disclosure
> by others is strictly prohibited. If you are not the intended recipient (or
> authorized to receive for the recipient), please contact the sender by
> reply email and delete all copies of this message.
>
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/index.html
>
> Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J
> 2T3. Phone: 416-306-7000; Fax: 416-306-7099. *Preferences
> <http://www.cisco.com/offer/subscribe/?sid=000478326> - Unsubscribe
> <http://www.cisco.com/offer/unsubscribe/?sid=000478327> – Privacy
> <http://www.cisco.com/web/siteassets/legal/privacy.html>*
>
>
>
>
>

RE: Mesos Slave Failover time

Posted by "Nastooh Avessta (navesta)" <na...@cisco.com>.
Thank you for your prompt reply. Any other method that could decrease failover time, in the meanwhile?
Cheers,

[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Nastooh Avessta
ENGINEER.SOFTWARE ENGINEERING
navesta@cisco.com
Phone: +1 604 647 1527

Cisco Systems Limited
595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121
VANCOUVER
BRITISH COLUMBIA
V7X 1J1
CA
Cisco.com<http://www.cisco.com/>





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html

Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J 2T3. Phone: 416-306-7000; Fax: 416-306-7099. Preferences<http://www.cisco.com/offer/subscribe/?sid=000478326> - Unsubscribe<http://www.cisco.com/offer/unsubscribe/?sid=000478327> – Privacy<http://www.cisco.com/web/siteassets/legal/privacy.html>

From: Vinod Kone [mailto:vinodkone@gmail.com]
Sent: Friday, July 17, 2015 4:07 PM
To: user@mesos.apache.org
Subject: Re: Mesos Slave Failover time

It's not configurable yet, but will be in the upcoming 0.23.0 release.

On Fri, Jul 17, 2015 at 3:46 PM, Nastooh Avessta (navesta) <na...@cisco.com>> wrote:
Hi
Trying to adjust the current failover time to below 10 seconds and don’t seem to be able to find the right set of parameters. Currently, it takes around minute and half for master to detect that a slave has gone offline, which seems to correspond to slave_ping_timeout=15*max_slave_ping_timeouts=5. However, I can’t find these parameters in mesos-master:

# mesos-master --version
mesos 0.22.1
#mesos-master --help
Usage: mesos-master [...]

Supported options:
  --acls=VALUE                             The value could be a JSON formatted string of ACLs
                                           or a file path containing the JSON formatted ACLs used
                                           for authorization. Path could be of the form 'file:///path/to/file'
                                           or '/path/to/file'.

                                           See the ACLs protobuf in mesos.proto for the expected format.

                                           Example:
                                           {
                                             "register_frameworks": [
                                                                  {
                                                                     "principals": { "type": "ANY" },
                                                                     "roles": { "values": ["a"] }
                                                                  }
                                                                ],
                                             "run_tasks": [
                                                             {
                                                                "principals": { "values": ["a", "b"] },
                                                                "users": { "values": ["c"] }
                                                             }
                                                           ],
                                             "shutdown_frameworks": [
                                                           {
                                                              "principals": { "values": ["a", "b"] },
                                                              "framework_principals": { "values": ["c"] }
                                                           }
                                                         ]
                                           }
  --allocation_interval=VALUE              Amount of time to wait between performing
                                            (batch) allocations (e.g., 500ms, 1sec, etc). (default: 1secs)
  --[no-]authenticate                      If authenticate is 'true' only authenticated frameworks are allowed
                                           to register. If 'false' unauthenticated frameworks are also
                                           allowed to register. (default: false)
  --[no-]authenticate_slaves               If 'true' only authenticated slaves are allowed to register.
                                           If 'false' unauthenticated slaves are also allowed to register. (default: false)
  --authenticators=VALUE                   Authenticator implementation to use when authenticating frameworks
                                           and/or slaves. Use the default 'crammd5', or
                                           load an alternate authenticator module using --modules. (default: crammd5)
  --cluster=VALUE                          Human readable name for the cluster,
                                           displayed in the webui.
  --credentials=VALUE                      Either a path to a text file with a list of credentials,
                                           each line containing 'principal' and 'secret' separated by whitespace,
                                           or, a path to a JSON-formatted file containing credentials.
                                           Path could be of the form 'file:///path/to/file' or '/path/to/file'.
                                           JSON file Example:
                                           {
                                             "credentials": [
                                                               {
                                                                  "principal": "sherman",
                                                                  "secret": "kitesurf",
                                                               }
                                                              ]
                                           }
                                           Text file Example:
                                           username secret

  --external_log_file=VALUE                Specified the externally managed log file. This file will be
                                           exposed in the webui and HTTP api. This is useful when using
                                           stderr logging as the log file is otherwise unknown to Mesos.
  --framework_sorter=VALUE                 Policy to use for allocating resources
                                           between a given user's frameworks. Options
                                           are the same as for user_allocator. (default: drf)
  --[no-]help                              Prints this help message (default: false)
  --hooks=VALUE                            A comma separated list of hook modules to be
                                           installed inside master.
  --hostname=VALUE                         The hostname the master should advertise in ZooKeeper.
                                           If left unset, the hostname is resolved from the IP address
                                           that the master binds to.
  --[no-]initialize_driver_logging         Whether to automatically initialize google logging of scheduler
                                           and/or executor drivers. (default: true)
  --ip=VALUE                               IP address to listen on
  --[no-]log_auto_initialize               Whether to automatically initialize the replicated log used for the
                                           registry. If this is set to false, the log has to be manually
                                           initialized when used for the very first time. (default: true)
  --log_dir=VALUE                          Directory path to put log files (no default, nothing
                                           is written to disk unless specified;
                                           does not affect logging to stderr).
                                           NOTE: 3rd party log messages (e.g. ZooKeeper) are
                                           only written to stderr!

  --logbufsecs=VALUE                       How many seconds to buffer log messages for (default: 0)
  --logging_level=VALUE                    Log message at or above this level; possible values:
                                           'INFO', 'WARNING', 'ERROR'; if quiet flag is used, this
                                           will affect just the logs from log_dir (if specified) (default: INFO)
  --modules=VALUE                          List of modules to be loaded and be available to the internal
                                           subsystems.

                                           Use --modules=filepath to specify the list of modules via a
                                           file containing a JSON formatted string. 'filepath' can be
                                           of the form 'file:///path/to/file' or '/path/to/file'.

                                           Use --modules="{...}" to specify the list of modules inline.

                                           Example:
                                           {
                                             "libraries": [
                                               {
                                                 "file": "/path/to/libfoo.so",
                                                 "modules": [
                                                   {
                                                     "name": "org_apache_mesos_bar",
                                                     "parameters": [
                                                       {
                                                         "key": "X",
                                                         "value": "Y"
                                                       }
                                                     ]
                                                   },
                                                   {
                                                     "name": "org_apache_mesos_baz"
                                                   }
                                                 ]
                                               },
                                               {
                                                 "name": "qux",
                                                 "modules": [
                                                   {
                                                     "name": "org_apache_mesos_norf"
                                                   }
                                                 ]
                                               }
                                             ]
                                           }
  --offer_timeout=VALUE                    Duration of time before an offer is rescinded from a framework.
                                           This helps fairness when running frameworks that hold on to offers,
                                           or frameworks that accidentally drop offers.
  --port=VALUE                             Port to listen on (default: 5050)
  --[no-]quiet                             Disable logging to stderr (default: false)
  --quorum=VALUE                           The size of the quorum of replicas when using 'replicated_log' based
                                           registry. It is imperative to set this value to be a majority of
                                           masters i.e., quorum > (number of masters)/2.
  --rate_limits=VALUE                      The value could be a JSON formatted string of rate limits
                                           or a file path containing the JSON formatted rate limits used
                                           for framework rate limiting.
                                           Path could be of the form 'file:///path/to/file'
                                           or '/path/to/file'.

                                           See the RateLimits protobuf in mesos.proto for the expected format.

                                           Example:
                                           {
                                             "limits": [
                                               {
                                                 "principal": "foo",
                                                 "qps": 55.5
                                               },
                                               {
                                                 "principal": "bar"
                                               }
                                             ],
                                             "aggregate_default_qps": 33.3
                                           }
  --recovery_slave_removal_limit=VALUE     For failovers, limit on the percentage of slaves that can be removed
                                           from the registry *and* shutdown after the re-registration timeout
                                           elapses. If the limit is exceeded, the master will fail over rather
                                           than remove the slaves.
                                           This can be used to provide safety guarantees for production
                                           environments. Production environments may expect that across Master
                                           failovers, at most a certain percentage of slaves will fail
                                           permanently (e.g. due to rack-level failures).
                                           Setting this limit would ensure that a human needs to get
                                           involved should an unexpected widespread failure of slaves occur
                                           in the cluster.
                                           Values: [0%-100%] (default: 100%)
  --registry=VALUE                         Persistence strategy for the registry;
                                           available options are 'replicated_log', 'in_memory' (for testing). (default: replicated_log)
  --registry_fetch_timeout=VALUE           Duration of time to wait in order to fetch data from the registry
                                           after which the operation is considered a failure. (default: 1mins)
  --registry_store_timeout=VALUE           Duration of time to wait in order to store data in the registry
                                           after which the operation is considered a failure. (default: 5secs)
  --[no-]registry_strict                   Whether the Master will take actions based on the persistent
                                           information stored in the Registry. Setting this to false means
                                           that the Registrar will never reject the admission, readmission,
                                           or removal of a slave. Consequently, 'false' can be used to
                                           bootstrap the persistent state on a running cluster.
                                           NOTE: This flag is *experimental* and should not be used in
                                           production yet. (default: false)
  --roles=VALUE                            A comma separated list of the allocation
                                           roles that frameworks in this cluster may
                                           belong to.
  --[no-]root_submissions                  Can root submit frameworks? (default: true)
  --slave_removal_rate_limit=VALUE         The maximum rate (e.g., 1/10mins, 2/3hrs, etc) at which slaves will
                                           be removed from the master when they fail health checks. By default
                                           slaves will be removed as soon as they fail the health checks.
                                           The value is of the form <Number of slaves>/<Duration>.
  --slave_reregister_timeout=VALUE         The timeout within which all slaves are expected to re-register
                                           when a new master is elected as the leader. Slaves that do not
                                           re-register within the timeout will be removed from the registry
                                           and will be shutdown if they attempt to communicate with master.
                                           NOTE: This value has to be atleast 10mins. (default: 10mins)
  --user_sorter=VALUE                      Policy to use for allocating resources
                                           between users. May be one of:
                                             dominant_resource_fairness (drf) (default: drf)
  --[no-]version                           Show version and exit. (default: false)
  --webui_dir=VALUE                        Directory path of the webui files/assets (default: /usr/share/mesos/webui)
  --weights=VALUE                          A comma separated list of role/weight pairs
                                           of the form 'role=weight,role=weight'. Weights
                                           are used to indicate forms of priority.
  --whitelist=VALUE                        Path to a file with a list of slaves
                                           (one per line) to advertise offers for.
                                           Path could be of the form 'file:///path/to/file' or '/path/to/file'.
  --work_dir=VALUE                         Directory path to store the persistent information stored in the
                                           Registry. (example: /var/lib/mesos/master)
  --zk=VALUE                               ZooKeeper URL (used for leader election amongst masters)
                                           May be one of:
                                             zk://host1:port1,host2:port2,.../path
                                             zk://username:password@host1:port1,host2:port2,.../path
                                             file:///path/to/file<file:///\\path\to\file> (where file contains one of the above)
  --zk_session_timeout=VALUE               ZooKeeper session timeout. (default: 10secs)

Furthermore, setting these parameter either in /etc/mesos-master/ or inline generates the following error:
# /usr/sbin/mesos-master --zk=zk://10.40.50.228:2181/mesos<http://10.40.50.228:2181/mesos> --port=5050 --log_dir=/var/log/mesos --hostname=10.40.50.228 --ip=10.40.50.228 --quorum=1 --work
_dir=/var/lib/mesos --max_slave_ping_timeouts=2
Failed to load unknown flag 'max_slave_ping_timeouts'
Usage: mesos-master [...]

Supported options:
  --acls=VALUE                             The valu
…

Any thoughts?
Cheers,
[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Nastooh Avessta
ENGINEER.SOFTWARE ENGINEERING
navesta@cisco.com<ma...@cisco.com>
Phone: +1 604 647 1527<tel:%2B1%20604%20647%201527>

Cisco Systems Limited
595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121
VANCOUVER
BRITISH COLUMBIA
V7X 1J1
CA
Cisco.com<http://www.cisco.com/>





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html

Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J 2T3. Phone: 416-306-7000<tel:416-306-7000>; Fax: 416-306-7099<tel:416-306-7099>. Preferences<http://www.cisco.com/offer/subscribe/?sid=000478326> - Unsubscribe<http://www.cisco.com/offer/unsubscribe/?sid=000478327> – Privacy<http://www.cisco.com/web/siteassets/legal/privacy.html>



Re: Mesos Slave Failover time

Posted by Vinod Kone <vi...@gmail.com>.
It's not configurable yet, but will be in the upcoming 0.23.0 release.

On Fri, Jul 17, 2015 at 3:46 PM, Nastooh Avessta (navesta) <
navesta@cisco.com> wrote:

>  Hi
>
> Trying to adjust the current failover time to below 10 seconds and don’t
> seem to be able to find the right set of parameters. Currently, it takes
> around minute and half for master to detect that a slave has gone offline,
> which seems to correspond to
> slave_ping_timeout=15*max_slave_ping_timeouts=5. However, I can’t find
> these parameters in mesos-master:
>
>
>
> # mesos-master --version
>
> mesos 0.22.1
>
> #mesos-master --help
>
> Usage: mesos-master [...]
>
>
>
> Supported options:
>
>   --acls=VALUE                             The value could be a JSON
> formatted string of ACLs
>
>                                            or a file path containing the
> JSON formatted ACLs used
>
>                                            for authorization. Path could
> be of the form 'file:///path/to/file'
>
>                                            or '/path/to/file'.
>
>
>
>                                            See the ACLs protobuf in
> mesos.proto for the expected format.
>
>
>
>                                            Example:
>
>                                            {
>
>                                              "register_frameworks": [
>
>                                                                   {
>
>
> "principals": { "type": "ANY" },
>
>
> "roles": { "values": ["a"] }
>
>                                                                   }
>
>                                                                 ],
>
>                                              "run_tasks": [
>
>                                                              {
>
>
>                                                        "principals": {
> "values": ["a", "b"] },
>
>                                                                 "users": {
> "values": ["c"] }
>
>                                                              }
>
>                                                            ],
>
>                                              "shutdown_frameworks": [
>
>                                                            {
>
>
> "principals": { "values": ["a", "b"] },
>
>
> "framework_principals": { "values": ["c"] }
>
>                                                            }
>
>                                                          ]
>
>                                            }
>
>   --allocation_interval=VALUE              Amount of time to wait between
> performing
>
>                                             (batch) allocations (e.g.,
> 500ms, 1sec, etc). (default: 1secs)
>
>   --[no-]authenticate                      If authenticate is 'true' only
> authenticated frameworks are allowed
>
>                                            to register. If 'false'
> unauthenticated frameworks are also
>
>                                            allowed to register. (default:
> false)
>
>   --[no-]authenticate_slaves               If 'true' only authenticated
> slaves are allowed to register.
>
>                                            If 'false' unauthenticated
> slaves are also allowed to register. (default: false)
>
>   --authenticators=VALUE                   Authenticator implementation to
> use when authenticating frameworks
>
>                                            and/or slaves. Use the default
> 'crammd5', or
>
>                                            load an alternate authenticator
> module using --modules. (default: crammd5)
>
>   --cluster=VALUE                          Human readable name for the
> cluster,
>
>                                            displayed in the webui.
>
>   --credentials=VALUE                      Either a path to a text file
> with a list of credentials,
>
>                                            each line containing
> 'principal' and 'secret' separated by whitespace,
>
>                                            or, a path to a JSON-formatted
> file containing credentials.
>
>                                            Path could be of the form
> 'file:///path/to/file' or '/path/to/file'.
>
>                                            JSON file Example:
>
>                                            {
>
>                                              "credentials": [
>
>                                                                {
>
>
> "principal": "sherman",
>
>
>              "secret": "kitesurf",
>
>                                                                }
>
>                                                               ]
>
>                                            }
>
>                                            Text file Example:
>
>                                            username secret
>
>
>
>   --external_log_file=VALUE                Specified the externally
> managed log file. This file will be
>
>                                            exposed in the webui and HTTP
> api. This is useful when using
>
>                                            stderr logging as the log file
> is otherwise unknown to Mesos.
>
>   --framework_sorter=VALUE                 Policy to use for allocating
> resources
>
>                                            between a given user's
> frameworks. Options
>
>                                            are the same as for
> user_allocator. (default: drf)
>
>   --[no-]help                              Prints this help message
> (default: false)
>
>   --hooks=VALUE                            A comma separated list of hook
> modules to be
>
>                                            installed inside master.
>
>   --hostname=VALUE                         The hostname the master should
> advertise in ZooKeeper.
>
>                                            If left unset, the hostname is
> resolved from the IP address
>
>                                            that the master binds to.
>
>   --[no-]initialize_driver_logging         Whether to automatically
> initialize google logging of scheduler
>
>                                            and/or executor drivers.
> (default: true)
>
>   --ip=VALUE                               IP address to listen on
>
>   --[no-]log_auto_initialize               Whether to automatically
> initialize the replicated log used for the
>
>                                            registry. If this is set to
> false, the log has to be manually
>
>                                            initialized when used for the
> very first time. (default: true)
>
>   --log_dir=VALUE                          Directory path to put log files
> (no default, nothing
>
>                                            is written to disk unless
> specified;
>
>                                            does not affect logging to
> stderr).
>
>                                            NOTE: 3rd party log messages
> (e.g. ZooKeeper) are
>
>                                            only written to stderr!
>
>
>
>   --logbufsecs=VALUE                       How many seconds to buffer log
> messages for (default: 0)
>
>   --logging_level=VALUE                    Log message at or above this
> level; possible values:
>
>                                            'INFO', 'WARNING', 'ERROR'; if
> quiet flag is used, this
>
>                                            will affect just the logs from
> log_dir (if specified) (default: INFO)
>
>   --modules=VALUE                          List of modules to be loaded
> and be available to the internal
>
>                                            subsystems.
>
>
>
>                                            Use --modules=filepath to
> specify the list of modules via a
>
>                                            file containing a JSON
> formatted string. 'filepath' can be
>
>                                            of the form
> 'file:///path/to/file' or '/path/to/file'.
>
>
>
>                                            Use --modules="{...}" to
> specify the list of modules inline.
>
>
>
>                                            Example:
>
>                                            {
>
>                                              "libraries": [
>
>                                                {
>
>                                                  "file":
> "/path/to/libfoo.so",
>
>                                                  "modules": [
>
>                                                    {
>
>                                                      "name":
> "org_apache_mesos_bar",
>
>                                                      "parameters": [
>
>                                                        {
>
>                                                          "key": "X",
>
>                                                          "value": "Y"
>
>                                                        }
>
>                                                      ]
>
>                                                    },
>
>                                                    {
>
>                                                      "name":
> "org_apache_mesos_baz"
>
>                                                    }
>
>                                                  ]
>
>                                                },
>
>                                                {
>
>                                                  "name": "qux",
>
>                                                  "modules": [
>
>                                                    {
>
>                                                      "name":
> "org_apache_mesos_norf"
>
>                                                    }
>
>                                                  ]
>
>                                                }
>
>                                              ]
>
>                                            }
>
>   --offer_timeout=VALUE                    Duration of time before an
> offer is rescinded from a framework.
>
>                                            This helps fairness when
> running frameworks that hold on to offers,
>
>                                            or frameworks that accidentally
> drop offers.
>
>   --port=VALUE                             Port to listen on (default:
> 5050)
>
>   --[no-]quiet                             Disable logging to stderr
> (default: false)
>
>   --quorum=VALUE                           The size of the quorum of
> replicas when using 'replicated_log' based
>
>                                            registry. It is imperative to
> set this value to be a majority of
>
>                                            masters i.e., quorum > (number
> of masters)/2.
>
>   --rate_limits=VALUE                      The value could be a JSON
> formatted string of rate limits
>
>                                            or a file path containing the
> JSON formatted rate limits used
>
>                                            for framework rate limiting.
>
>                                            Path could be of the form
> 'file:///path/to/file'
>
>                                            or '/path/to/file'.
>
>
>
>                                            See the RateLimits protobuf in
> mesos.proto for the expected format.
>
>
>
>                                            Example:
>
>                                            {
>
>                                              "limits": [
>
>                                                {
>
>                                                  "principal": "foo",
>
>                                                  "qps": 55.5
>
>                                                },
>
>                                                {
>
>                                                  "principal": "bar"
>
>                                                }
>
>                                              ],
>
>                                              "aggregate_default_qps": 33.3
>
>                                            }
>
>   --recovery_slave_removal_limit=VALUE     For failovers, limit on the
> percentage of slaves that can be removed
>
>                                            from the registry *and*
> shutdown after the re-registration timeout
>
>                                            elapses. If the limit is
> exceeded, the master will fail over rather
>
>                                            than remove the slaves.
>
>                                            This can be used to provide
> safety guarantees for production
>
>                                            environments. Production
> environments may expect that across Master
>
>                                            failovers, at most a certain
> percentage of slaves will fail
>
>                                            permanently (e.g. due to
> rack-level failures).
>
>                                            Setting this limit would ensure
> that a human needs to get
>
>                                            involved should an unexpected
> widespread failure of slaves occur
>
>                                            in the cluster.
>
>                                            Values: [0%-100%] (default:
> 100%)
>
>   --registry=VALUE                         Persistence strategy for the
> registry;
>
>                                            available options are
> 'replicated_log', 'in_memory' (for testing). (default: replicated_log)
>
>   --registry_fetch_timeout=VALUE           Duration of time to wait in
> order to fetch data from the registry
>
>                                            after which the operation is
> considered a failure. (default: 1mins)
>
>   --registry_store_timeout=VALUE           Duration of time to wait in
> order to store data in the registry
>
>                                            after which the operation is
> considered a failure. (default: 5secs)
>
>   --[no-]registry_strict                   Whether the Master will take
> actions based on the persistent
>
>                                            information stored in the
> Registry. Setting this to false means
>
>                                            that the Registrar will never
> reject the admission, readmission,
>
>                                            or removal of a slave.
> Consequently, 'false' can be used to
>
>                                            bootstrap the persistent state
> on a running cluster.
>
>                                            NOTE: This flag is
> *experimental* and should not be used in
>
>                                            production yet. (default: false)
>
>   --roles=VALUE                            A comma separated list of the
> allocation
>
>                                            roles that frameworks in this
> cluster may
>
>                                            belong to.
>
>   --[no-]root_submissions                  Can root submit frameworks?
> (default: true)
>
>   --slave_removal_rate_limit=VALUE         The maximum rate (e.g.,
> 1/10mins, 2/3hrs, etc) at which slaves will
>
>                                            be removed from the master when
> they fail health checks. By default
>
>                                            slaves will be removed as soon
> as they fail the health checks.
>
>                                            The value is of the form
> <Number of slaves>/<Duration>.
>
>   --slave_reregister_timeout=VALUE         The timeout within which all
> slaves are expected to re-register
>
>                                            when a new master is elected as
> the leader. Slaves that do not
>
>                                            re-register within the timeout
> will be removed from the registry
>
>                                            and will be shutdown if they
> attempt to communicate with master.
>
>                                            NOTE: This value has to be
> atleast 10mins. (default: 10mins)
>
>   --user_sorter=VALUE                      Policy to use for allocating
> resources
>
>                                            between users. May be one of:
>
>                                              dominant_resource_fairness
> (drf) (default: drf)
>
>   --[no-]version                           Show version and exit.
> (default: false)
>
>   --webui_dir=VALUE                        Directory path of the webui
> files/assets (default: /usr/share/mesos/webui)
>
>   --weights=VALUE                          A comma separated list of
> role/weight pairs
>
>                                            of the form
> 'role=weight,role=weight'. Weights
>
>                                            are used to indicate forms of
> priority.
>
>   --whitelist=VALUE                        Path to a file with a list of
> slaves
>
>                                            (one per line) to advertise
> offers for.
>
>                                            Path could be of the form
> 'file:///path/to/file' or '/path/to/file'.
>
>   --work_dir=VALUE                         Directory path to store the
> persistent information stored in the
>
>                                            Registry. (example:
> /var/lib/mesos/master)
>
>   --zk=VALUE                               ZooKeeper URL (used for leader
> election amongst masters)
>
>                                            May be one of:
>
>
> zk://host1:port1,host2:port2,.../path
>
>                                              zk://username:password@host1
> :port1,host2:port2,.../path
>
>                                              file:///path/to/file (where
> file contains one of the above)
>
>   --zk_session_timeout=VALUE               ZooKeeper session timeout.
> (default: 10secs)
>
>
>
> Furthermore, setting these parameter either in /etc/mesos-master/ or
> inline generates the following error:
>
> # /usr/sbin/mesos-master --zk=zk://10.40.50.228:2181/mesos --port=5050
> --log_dir=/var/log/mesos --hostname=10.40.50.228 --ip=10.40.50.228
> --quorum=1 --work
>
> _dir=/var/lib/mesos --max_slave_ping_timeouts=2
>
> Failed to load unknown flag 'max_slave_ping_timeouts'
>
> Usage: mesos-master [...]
>
>
>
> Supported options:
>
>   --acls=VALUE                             The valu
>
> …
>
>
>
> Any thoughts?
>
> Cheers,
>
> [image: http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]
>
> *Nastooh Avessta*
> ENGINEER.SOFTWARE ENGINEERING
> navesta@cisco.com
> Phone: *+1 604 647 1527 <%2B1%20604%20647%201527>*
>
> *Cisco Systems Limited*
> 595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121
> VANCOUVER
> BRITISH COLUMBIA
> V7X 1J1
> CA
> Cisco.com <http://www.cisco.com/>
>
>
>
> [image: Think before you print.]Think before you print.
>
> This email may contain confidential and privileged material for the sole
> use of the intended recipient. Any review, use, distribution or disclosure
> by others is strictly prohibited. If you are not the intended recipient (or
> authorized to receive for the recipient), please contact the sender by
> reply email and delete all copies of this message.
>
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/index.html
>
> Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J
> 2T3. Phone: 416-306-7000; Fax: 416-306-7099. *Preferences
> <http://www.cisco.com/offer/subscribe/?sid=000478326> - Unsubscribe
> <http://www.cisco.com/offer/unsubscribe/?sid=000478327> – Privacy
> <http://www.cisco.com/web/siteassets/legal/privacy.html>*
>
>
>