You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Andrei Sekretenko (Jira)" <ji...@apache.org> on 2019/10/08 15:47:00 UTC
[jira] [Commented] (MESOS-10008) Invalid quota config can crash
master
[ https://issues.apache.org/jira/browse/MESOS-10008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16946993#comment-16946993 ]
Andrei Sekretenko commented on MESOS-10008:
-------------------------------------------
Note that the code which logs the value does not handle anything larger than 1<<63 / 1000.0 correctly:
https://github.com/apache/mesos/blob/43bbe365db469d5e641d71f5884bd0fb1c012ea1/src/common/values.cpp#L58
> Invalid quota config can crash master
> -------------------------------------
>
> Key: MESOS-10008
> URL: https://issues.apache.org/jira/browse/MESOS-10008
> Project: Mesos
> Issue Type: Improvement
> Reporter: Andrei Sekretenko
> Priority: Major
>
> We are observing the following crash on the 1.9.1 master:
> {code}
> I1008 10:12:15.148486 4687 http.cpp:1115] HTTP POST for /master/api/v1?_ts=1570529541073&UPDATE_QUOTA from 10.0.7.253:35410 with User-Agent='Mozilla/5.0 (Windows NT 6.1; Win64; x64) Ap>
> I1008 10:12:15.148665 4687 http.cpp:263] Processing call UPDATE_QUOTA
> I1008 10:12:15.148756 4687 quota_handler.cpp:1136] Authorizing principal 'bootstrapuser' to update quota config for role 's1'
> I1008 10:12:15.149169 4685 registrar.cpp:487] Applied 1 operations in 56277ns; attempting to update the registry
> I1008 10:12:15.149338 4681 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 13
> I1008 10:12:15.149467 4689 replica.cpp:541] Replica received write request for position 13 from __req_res__(29)@10.0.7.253:5050
> I1008 10:12:15.151820 4683 replica.cpp:695] Replica received learned notice for position 13 from log-network(2)@10.0.7.253:5050
> I1008 10:12:15.153559 4679 registrar.cpp:544] Successfully updated the registry in 4.348928ms
> I1008 10:12:15.153592 4678 coordinator.cpp:348] Coordinator attempting to write TRUNCATE action at position 14
> I1008 10:12:15.153715 4679 hierarchical.cpp:1619] Updated quota for role 's1', guarantees: {} limits: cpus:2; disk:-9.22337203685478e+15; gpus:3; mem:1000000000000
> I1008 10:12:15.153796 4677 replica.cpp:541] Replica received write request for position 14 from __req_res__(30)@10.0.7.253:5050
> I1008 10:12:15.155380 4691 replica.cpp:695] Replica received learned notice for position 14 from log-network(2)@10.0.7.253:5050
> I1008 10:12:15.249722 4677 authenticator.cpp:324] dstip=10.0.7.253 type=audit timestamp=2019-10-08 10:12:15.249673984+00:00 reason="Valid authentication token" uid="bootstrapuser" obje>
> I1008 10:12:15.249956 4682 http.cpp:1115] HTTP GET for /master/state-summary?_ts=1570529541169 from 10.0.7.253:35414 with User-Agent='Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebK>
> I1008 10:12:15.250633 4691 http.cpp:1132] HTTP GET for /master/state-summary?_ts=1570529541169 from 10.0.7.253:35414: '200 OK' after 1.72621ms
> I1008 10:12:15.570379 4689 hierarchical.cpp:1908] Before allocation, required quota headroom is {} and available quota headroom is cpus:0.9; disk:75853; mem:5507
> F1008 10:12:15.570580 4689 resource_quantities.cpp:330] Check failed: scalar >= Value::Scalar() (-9.22337203685478e+15 vs. 0)
> *** Check failure stack trace: ***
> @ 0x7fc786f0148d google::LogMessage::Fail()
> @ 0x7fc786f036e8 google::LogMessage::SendToLog()
> @ 0x7fc786f01023 google::LogMessage::Flush()
> @ 0x7fc786f04029 google::LogMessageFatal::~LogMessageFatal()
> @ 0x7fc785954dfa mesos::ResourceQuantities::add()
> @ 0x7fc785954fb6 mesos::ResourceQuantities::fromScalarResource()
> @ 0x7fc78595e135 mesos::shrinkResources()
> @ 0x7fc785a874a9 mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::__allocate()
> @ 0x7fc785a88089 mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::_allocate()
> @ 0x7fc785a93882 _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchI7NothingN5mesos8internal6master9allocator8internal28Hier>
> @ 0x7fc786e49e21 process::ProcessBase::consume()
> @ 0x7fc786e6141b process::ProcessManager::resume()
> @ 0x7fc786e670b6 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> @ 0x7fc782a28b22 (unknown)
> @ 0x7fc7821be94a (unknown)
> @ 0x7fc781eef07f clone
> {code}
> Note that the value of disk quota limit is *logged* as "negative".
--
This message was sent by Atlassian Jira
(v8.3.4#803005)