You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "David Robinson (JIRA)" <ji...@apache.org> on 2016/09/30 00:48:21 UTC

[jira] [Created] (AURORA-1786) -zk_session_timeout option does not work

David Robinson created AURORA-1786:
--------------------------------------

             Summary: -zk_session_timeout option does not work
                 Key: AURORA-1786
                 URL: https://issues.apache.org/jira/browse/AURORA-1786
             Project: Aurora
          Issue Type: Bug
            Reporter: David Robinson


Looks like the -zk_session_timeout option has no affect. I've set -zk_session_timeout="60mins" to attempt to work around ZK session timeouts (due to GC pauses caused by TaskHistoryPruner pruning a huge number of inactive tasks), but the default 30 seconds seems to always be used.

{noformat}
I0929 22:36:10.804 [main, ArgScanner:411] zk_chroot_path: null 
I0929 22:36:10.804 [main, ArgScanner:411] zk_digest_credentials: xxxx:xxxx 
I0929 22:36:10.805 [main, ArgScanner:411] zk_endpoints: [zk.example.com:2181] 
I0929 22:36:10.805 [main, ArgScanner:411] zk_in_proc: false 
I0929 22:36:10.805 [main, ArgScanner:411] zk_session_timeout: (30, mins) 
I0929 22:36:10.805 [main, ArgScanner:411] zk_use_curator: true 
{noformat}

{noformat}
I0929 22:48:37.678 [AsyncProcessor-3, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
I0929 22:48:37.738 [AsyncProcessor-5, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
2016-09-29 22:48:37,794:47040(0x7f07f4c3c940):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 12ms
I0929 22:48:37.805 [AsyncProcessor-0, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
I0929 22:48:37.814 [AsyncProcessor-6, MemTaskStore:148] Query took 588 ms: ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0, limit=0} 
I0929 22:48:37.867 [AsyncProcessor-1, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
I0929 22:48:37.873 [AsyncProcessor-2, MemTaskStore:148] Query took 304 ms: ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0, limit=0} 
I0929 22:48:37.875 [AsyncProcessor-7, MemTaskStore:148] Query took 289 ms: ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0, limit=0} 
I0929 22:48:37.886 [AsyncProcessor-4, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
I0929 22:48:38.045 [AsyncProcessor-3, MemTaskStore:148] Query took 359 ms: ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0, limit=0} 
I0929 22:48:38.152 [AsyncProcessor-5, MemTaskStore:148] Query took 405 ms: ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0, limit=0} 
I0929 22:48:38.407 [AsyncProcessor-0, MemTaskStore:148] Query took 594 ms: ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0, limit=0} 
I0929 22:48:38.442 [AsyncProcessor-1, MemTaskStore:148] Query took 566 ms: ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0, limit=0} 
I0929 22:48:38.445 [AsyncProcessor-4, MemTaskStore:148] Query took 550 ms: ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0, limit=0} 
I0929 22:48:38.460 [AsyncProcessor-7, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
I0929 22:48:38.468 [AsyncProcessor-2, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
2016-09-29 22:48:51,141:47040(0x7f07f4c3c940):ZOO_WARN@zookeeper_interest@1570: Exceeded deadline by 13ms
I0929 22:49:01.002467 47173 process.cpp:3323] Handling HTTP event for process 'metrics' with path: '/metrics/snapshot'
I0929 22:48:38.483 [AsyncProcessor-6, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
W0929 22:49:07.165 [main-SendThread(smf1-alj-03-sr1.prod.twitter.com:2181), ClientCnxn$SendThread:1108] Client session timed out, have not heard from server in 36019ms for sessionid 0x576f9386901ce3 
W0929 22:49:07.168 [qtp382517336-72, LeaderRedirect:194] No serviceGroupMonitor in host set, will not redirect despite not being leader. 
I0929 22:49:07.170 [qtp382517336-72, Slf4jRequestLog:60] 127.0.0.1 - - [29/Sep/2016:22:49:07 +0000] "GET //localhost:8081/quotas HTTP/1.1" 503 1561  
I0929 22:49:07.171 [AsyncProcessor-7, MemTaskStore:148] Query took 28701 ms: ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0, limit=0} 
I0929 22:49:07.171 [AsyncProcessor-2, MemTaskStore:148] Query took 28693 ms: ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0, limit=0} 
I0929 22:49:07.171 [qtp382517336-52, Slf4jRequestLog:60] 127.0.0.1 - - [29/Sep/2016:22:49:07 +0000] "GET //localhost:8081/vars.json?filtered=1 HTTP/1.1" 200 34679  
I0929 22:49:07.172 [main-SendThread(smf1-alj-03-sr1.prod.twitter.com:2181), ClientCnxn$SendThread:1156] Client session timed out, have not heard from server in 36019ms for sessionid 0x576f9386901ce3, closing socket connection and attempting reconnect 
I0929 22:49:07.179 [AsyncProcessor-0, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
I0929 22:49:07.179 [AsyncProcessor-5, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
I0929 22:49:07.273 [main-EventThread, ConnectionStateManager:228] State change: SUSPENDED 
E0929 22:49:07.345 [Curator-ConnectionStateManager-0, SchedulerLifecycle$SchedulerCandidateImpl:395] Lost leadership, committing suicide. 
I0929 22:49:07.359 [Curator-ConnectionStateManager-0, StateMachine$Builder:389] SchedulerLifecycle state machine transition LEADER_AWAITING_REGISTRATION -> DEAD
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)