You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Andrei Sekretenko (Jira)" <ji...@apache.org> on 2019/10/08 14:24:00 UTC

[jira] [Created] (MESOS-10008) Invalid quota config can crash master

Andrei Sekretenko created MESOS-10008:
-----------------------------------------

             Summary: Invalid quota config can crash master
                 Key: MESOS-10008
                 URL: https://issues.apache.org/jira/browse/MESOS-10008
             Project: Mesos
          Issue Type: Improvement
            Reporter: Andrei Sekretenko


We are observing the following crash on the 1.9.1 master:

{code}
I1008 10:12:15.148486  4687 http.cpp:1115] HTTP POST for /master/api/v1?_ts=1570529541073&UPDATE_QUOTA from 10.0.7.253:35410 with User-Agent='Mozilla/5.0 (Windows NT 6.1; Win64; x64) Ap>
I1008 10:12:15.148665  4687 http.cpp:263] Processing call UPDATE_QUOTA
I1008 10:12:15.148756  4687 quota_handler.cpp:1136] Authorizing principal 'bootstrapuser' to update quota config for role 's1'
I1008 10:12:15.149169  4685 registrar.cpp:487] Applied 1 operations in 56277ns; attempting to update the registry
I1008 10:12:15.149338  4681 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 13
I1008 10:12:15.149467  4689 replica.cpp:541] Replica received write request for position 13 from __req_res__(29)@10.0.7.253:5050
I1008 10:12:15.151820  4683 replica.cpp:695] Replica received learned notice for position 13 from log-network(2)@10.0.7.253:5050
I1008 10:12:15.153559  4679 registrar.cpp:544] Successfully updated the registry in 4.348928ms
I1008 10:12:15.153592  4678 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 14
I1008 10:12:15.153715  4679 hierarchical.cpp:1619] Updated quota for role 's1',  guarantees: {} limits: cpus:2; disk:-9.22337203685478e+15; gpus:3; mem:1000000000000
I1008 10:12:15.153796  4677 replica.cpp:541] Replica received write request for position 14 from __req_res__(30)@10.0.7.253:5050
I1008 10:12:15.155380  4691 replica.cpp:695] Replica received learned notice for position 14 from log-network(2)@10.0.7.253:5050
I1008 10:12:15.249722  4677 authenticator.cpp:324] dstip=10.0.7.253 type=audit timestamp=2019-10-08 10:12:15.249673984+00:00 reason="Valid authentication token" uid="bootstrapuser" obje>
I1008 10:12:15.249956  4682 http.cpp:1115] HTTP GET for /master/state-summary?_ts=1570529541169 from 10.0.7.253:35414 with User-Agent='Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebK>
I1008 10:12:15.250633  4691 http.cpp:1132] HTTP GET for /master/state-summary?_ts=1570529541169 from 10.0.7.253:35414: '200 OK' after 1.72621ms
I1008 10:12:15.570379  4689 hierarchical.cpp:1908] Before allocation, required quota headroom is {} and available quota headroom is cpus:0.9; disk:75853; mem:5507
F1008 10:12:15.570580  4689 resource_quantities.cpp:330] Check failed: scalar >= Value::Scalar() (-9.22337203685478e+15 vs. 0)
*** Check failure stack trace: ***
    @     0x7fc786f0148d  google::LogMessage::Fail()
    @     0x7fc786f036e8  google::LogMessage::SendToLog()
    @     0x7fc786f01023  google::LogMessage::Flush()
    @     0x7fc786f04029  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fc785954dfa  mesos::ResourceQuantities::add()
    @     0x7fc785954fb6  mesos::ResourceQuantities::fromScalarResource()
    @     0x7fc78595e135  mesos::shrinkResources()
    @     0x7fc785a874a9  mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::__allocate()
    @     0x7fc785a88089  mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::_allocate()
    @     0x7fc785a93882  _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchI7NothingN5mesos8internal6master9allocator8internal28Hier>
    @     0x7fc786e49e21  process::ProcessBase::consume()
    @     0x7fc786e6141b  process::ProcessManager::resume()
    @     0x7fc786e670b6  _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
    @     0x7fc782a28b22  (unknown)
    @     0x7fc7821be94a  (unknown)
    @     0x7fc781eef07f  clone
{code}

Note that the value of disk quota limit is *logged* as "negative".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)