You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Bhuvan Arumugam (JIRA)" <ji...@apache.org> on 2014/11/05 00:39:33 UTC

[jira] [Updated] (MESOS-2043) framework auth fail with timeout error and never get authenticated

     [ https://issues.apache.org/jira/browse/MESOS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bhuvan Arumugam updated MESOS-2043:
-----------------------------------
    Attachment: aurora-scheduler.20141104-1606-1706.log
                mesos-master.20141104-1606-1706.log

[~vinodkone] it wasn't a failover, but master was restarted after an upgrade.
I've attached the log both master and scheduler, during first 1hr.

master log snippet:
{code}
I1104 16:06:39.019181 35273 master.cpp:3874] Authenticating scheduler-8160bf27-7799-4b8c-921c-b2e87869475b@AURORA_IP:8083
I1104 16:06:39.019480 35273 master.cpp:3885] Using default CRAM-MD5 authenticator
I1104 16:06:39.020884 35290 authenticator.hpp:107] Initializing server SASL     
I1104 16:06:39.022680 35290 authenticator.hpp:169] Creating new server SASL connection
W1104 16:06:44.022080 35275 master.cpp:3953] Authentication timed out           
{code}

scheduler log snippet:
{code}
I1104 16:06:34.006535 23272 detector.cpp:138] Detected a new leader: (id='115') 
I1104 16:06:34.007257 23270 group.cpp:659] Trying to get '/mesos/info_0000000115' in ZooKeeper
I1104 16:06:34.008654 23270 detector.cpp:433] A new leading master (UPID=master@MASTER_IP:PORT) is detected
W1104 16:06:34.009 THREAD3393 org.apache.aurora.scheduler.MesosSchedulerImpl.disconnected: Framework disconnected.
I1104 16:06:34.010 THREAD3393 org.apache.aurora.scheduler.async.OfferQueue$OfferQueueImpl.driverDisconnected: Clearing stale offers since the driver is disconnected.
I1104 16:06:34.010766 23281 sched.cpp:233] New master detected at master@MASTER_IP:PORT
I1104 16:06:34.010834 23281 sched.cpp:283] Authenticating with master master@MASTER_IP:PORT
I1104 16:06:34.011281 23263 authenticatee.hpp:133] Creating new client SASL connection
W1104 16:06:39.016166 23274 sched.cpp:378] Authentication timed out             
I1104 16:06:39.016585 23263 sched.cpp:338] Failed to authenticate with master master@MASTER_IP:PORT: Authentication discarded
I1104 16:06:39.016669 23263 sched.cpp:283] Authenticating with master master@MASTER_IP:PORT
I1104 16:06:39.017057 23282 authenticatee.hpp:133] Creating new client SASL connection
I1104 16:06:39.023083 23279 authenticatee.hpp:224] Received SASL authentication mechanisms: CRAM-MD5
I1104 16:06:39.023138 23279 authenticatee.hpp:250] Attempting to authenticate with mechanism 'CRAM-MD5'
W1104 16:06:44.022470 23268 sched.cpp:378] Authentication timed out             
{code}


> framework auth fail with timeout error and never get authenticated
> ------------------------------------------------------------------
>
>                 Key: MESOS-2043
>                 URL: https://issues.apache.org/jira/browse/MESOS-2043
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.0
>            Reporter: Bhuvan Arumugam
>         Attachments: aurora-scheduler.20141104-1606-1706.log, mesos-master.20141104-1606-1706.log
>
>
> I'm facing this issue in master as of https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4
> As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm running 1 master and 1 scheduler (aurora). The framework authentication fail due to time out:
> error on mesos master:
> {code}
> I1104 19:37:17.741449  8329 master.cpp:3874] Authenticating scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083
> I1104 19:37:17.741585  8329 master.cpp:3885] Using default CRAM-MD5 authenticator
> I1104 19:37:17.742106  8336 authenticator.hpp:169] Creating new server SASL connection
> W1104 19:37:22.742959  8329 master.cpp:3953] Authentication timed out
> W1104 19:37:22.743548  8329 master.cpp:3930] Failed to authenticate scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083: Authentication discarded
> {code}
> scheduler error:
> {code}
> I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master master@MASTER_IP:PORT
> I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL connection
> I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL authentication mechanisms: CRAM-MD5
> I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate with mechanism 'CRAM-MD5'
> W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out
> I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master master@MASTER_IP:PORT: Authentication discarded
> {code}
> Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}} & {{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is trying to authenticate and fail.
> {code}
> W1104 19:36:30.769420  8319 master.cpp:3930] Failed to authenticate scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to communicate with authenticatee
> I1104 19:36:42.701441  8328 master.cpp:3860] Queuing up authentication request from scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 because authentication is still in progress
> {code}
> Restarting master and scheduler didn't fix it. 
> This particular issue happen with 1 master and 1 scheduler after MESOS-1866 is fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)