You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Frank Scholten <fr...@frankscholten.nl> on 2016/12/27 13:02:49 UTC

mesos-execute stuck in subscribe loop

Hi,

I am running Mesos 0.26.2 and the following command is stuck in a subscribe loop

# GLOG_v=1 LIBPROCESS_IP=10.2.0.219  mesos-execute
--master=10.2.1.116:5050 --name="cluster-test" --command="sleep 5"
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1227 04:24:32.049841 41523 process.cpp:965] libprocess is initialized
on 10.2.0.219:39749 for 48 cpus
I1227 04:24:32.049999 41523 logging.cpp:198] Logging to STDERR
I1227 04:24:32.051537 41523 sched.cpp:166] Version: 0.26.2
I1227 04:24:32.055392 41543 sched.cpp:264] New master detected at
master@10.2.1.116:5050
I1227 04:24:32.055551 41543 sched.cpp:274] No credentials provided.
Attempting to register without authentication
I1227 04:24:32.055565 41543 sched.cpp:716] Sending SUBSCRIBE call to
master@10.2.1.116:5050
I1227 04:24:32.055624 41543 sched.cpp:749] Will retry registration in
587.356137ms if necessary
I1227 04:24:32.644569 41543 sched.cpp:716] Sending SUBSCRIBE call to
master@10.2.1.116:5050
I1227 04:24:32.644649 41543 sched.cpp:749] Will retry registration in
2.884023157secs if necessary

I ran tpcdump on the master

# tcpdump -i bond0.7 dst 10.2.0.219
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on bond0.7, link-type EN10MB (Ethernet), capture size 65535 bytes
05:01:35.207079 IP cnlvr02r04c31.mmcc > agent1.cncf.io.52860: Flags
[F.], seq 2751495045, ack 539538316, win 336, options [nop,nop,TS val
868111506 ecr 867845120], length 0
0x0000:  4500 0034 0b6f 4000 4006 1903 0a02 0174  E..4.o@.@......t
0x0010:  0a02 00db 13ba ce7c a400 7b85 2028 b38c  .......|..{..(..
0x0020:  8011 0150 899d 0000 0101 080a 33be 5492  ...P........3.T.
0x0030:  33ba 4400                                3.D.
05:01:35.549544 IP cnlvr02r04c31.mmcc > agent1.cncf.io.52862: Flags
[S.], seq 3372848870, ack 3473989761, win 26844, options [mss
8960,sackOK,TS val 868111591 ecr 867845205,nop,wscale 7], length 0
0x0000:  4500 003c 0000 4000 4006 246a 0a02 0174  E..<..@.@.$j...t
0x0010:  0a02 00db 13ba ce7e c909 96e6 cf10 e081  .......~........
0x0020:  a012 68dc b904 0000 0204 2300 0402 080a  ..h.......#.....
0x0030:  33be 54e7 33ba 4455 0103 0307            3.T.3.DU....
05:01:35.549746 IP cnlvr02r04c31.mmcc > agent1.cncf.io.52862: Flags
[.], ack 318, win 219, options [nop,nop,TS val 868111591 ecr
867845205], length 0
0x0000:  4500 0034 c200 4000 4006 6271 0a02 0174  E..4..@.@.bq...t
0x0010:  0a02 00db 13ba ce7e c909 96e7 cf10 e1be  .......~........
0x0020:  8010 00db 6be1 0000 0101 080a 33be 54e7  ....k.......3.T.
0x0030:  33ba 4455                                3.DU
05:01:35.780313 IP cnlvr02r04c31.mmcc > agent1.cncf.io.52862: Flags
[.], ack 635, win 227, options [nop,nop,TS val 868111649 ecr
867845263], length 0
0x0000:  4500 0034 c201 4000 4006 6270 0a02 0174  E..4..@.@.bp...t
0x0010:  0a02 00db 13ba ce7e c909 96e7 cf10 e2fb  .......~........
0x0020:  8010 00e3 6a28 0000 0101 080a 33be 5521  ....j(......3.U!
0x0030:  33ba 448f                                3.D.

but I can't see what's in the packet since it is an encoded protobuf message.

How do you debug these kind of issues? What kind of tools do you use?

Cheers,

Frank

Re: mesos-execute stuck in subscribe loop

Posted by Frank Scholten <fr...@frankscholten.nl>.
Ah! The problem was I was not submitting to the leading master. When I
changed the --master IP to the leading master the task got submitted.
I changed the command from sleep to python.

root@cnlvr01r08s1:~# GLOG_v=1 LIBPROCESS_IP=10.2.0.219  mesos-execute
--master=10.2.1.118:5050 --name="cluster-test" --command="python -m
SimpleHTTPServer"
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1228 08:17:38.477524 42862 process.cpp:965] libprocess is initialized
on 10.2.0.219:46533 for 48 cpus
I1228 08:17:38.477645 42862 logging.cpp:198] Logging to STDERR
I1228 08:17:38.478806 42862 sched.cpp:166] Version: 0.26.2
I1228 08:17:38.482700 42863 sched.cpp:264] New master detected at
master@10.2.1.118:5050
I1228 08:17:38.482929 42863 sched.cpp:274] No credentials provided.
Attempting to register without authentication
I1228 08:17:38.482952 42863 sched.cpp:716] Sending SUBSCRIBE call to
master@10.2.1.118:5050
I1228 08:17:38.483043 42863 sched.cpp:749] Will retry registration in
1.442011578secs if necessary
I1228 08:17:38.483903 42894 sched.cpp:643] Framework registered with
e4b204c0-4d3a-44c9-880c-03bd0063fe26-0013
Framework registered with e4b204c0-4d3a-44c9-880c-03bd0063fe26-0013
I1228 08:17:38.483963 42894 sched.cpp:657] Scheduler::registered took 29377ns
I1228 08:17:38.487526 42882 resources.cpp:472] Parsing resources as
JSON failed: cpus:1;mem:128
Trying semicolon-delimited string format instead
task cluster-test submitted to slave e4b204c0-4d3a-44c9-880c-03bd0063fe26-S3012
I1228 08:17:38.487763 42882 sched.cpp:813] Scheduler::resourceOffers
took 266521ns
Received status update TASK_RUNNING for task cluster-test
I1228 08:17:38.595126 42895 sched.cpp:921] Scheduler::statusUpdate took 37167ns
I1228 08:17:43.826679 42871 sched.cpp:813] Scheduler::resourceOffers
took 41784ns

Before I got the task scheduled I the following in the log of the
master that was not leading:

I1228 08:32:10.493804 36745 master.cpp:1193] Dropping
'mesos.scheduler.Call' message since not elected yet
I1228 08:32:10.724387 36745 master.cpp:1193] Dropping
'mesos.scheduler.Call' message since not elected yet

Perhaps mesos-execute should check if the master is leading first? Or
is this already fixed in versions after 0.26.0?

Re: mesos-execute stuck in subscribe loop

Posted by haosdent <ha...@gmail.com>.
Hi, @Frank May I have your master log? Usually we check the master log
first.

On Tue, Dec 27, 2016 at 9:02 PM, Frank Scholten <fr...@frankscholten.nl>
wrote:

> Hi,
>
> I am running Mesos 0.26.2 and the following command is stuck in a
> subscribe loop
>
> # GLOG_v=1 LIBPROCESS_IP=10.2.0.219  mesos-execute
> --master=10.2.1.116:5050 --name="cluster-test" --command="sleep 5"
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I1227 04:24:32.049841 41523 process.cpp:965] libprocess is initialized
> on 10.2.0.219:39749 for 48 cpus
> I1227 04:24:32.049999 41523 logging.cpp:198] Logging to STDERR
> I1227 04:24:32.051537 41523 sched.cpp:166] Version: 0.26.2
> I1227 04:24:32.055392 41543 sched.cpp:264] New master detected at
> master@10.2.1.116:5050
> I1227 04:24:32.055551 41543 sched.cpp:274] No credentials provided.
> Attempting to register without authentication
> I1227 04:24:32.055565 41543 sched.cpp:716] Sending SUBSCRIBE call to
> master@10.2.1.116:5050
> I1227 04:24:32.055624 41543 sched.cpp:749] Will retry registration in
> 587.356137ms if necessary
> I1227 04:24:32.644569 41543 sched.cpp:716] Sending SUBSCRIBE call to
> master@10.2.1.116:5050
> I1227 04:24:32.644649 41543 sched.cpp:749] Will retry registration in
> 2.884023157secs if necessary
>
> I ran tpcdump on the master
>
> # tcpdump -i bond0.7 dst 10.2.0.219
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on bond0.7, link-type EN10MB (Ethernet), capture size 65535 bytes
> 05:01:35.207079 IP cnlvr02r04c31.mmcc > agent1.cncf.io.52860: Flags
> [F.], seq 2751495045, ack 539538316, win 336, options [nop,nop,TS val
> 868111506 ecr 867845120], length 0
> 0x0000:  4500 0034 0b6f 4000 4006 1903 0a02 0174  E..4.o@.@......t
> 0x0010:  0a02 00db 13ba ce7c a400 7b85 2028 b38c  .......|..{..(..
> 0x0020:  8011 0150 899d 0000 0101 080a 33be 5492  ...P........3.T.
> 0x0030:  33ba 4400                                3.D.
> 05:01:35.549544 IP cnlvr02r04c31.mmcc > agent1.cncf.io.52862: Flags
> [S.], seq 3372848870, ack 3473989761, win 26844, options [mss
> 8960,sackOK,TS val 868111591 ecr 867845205,nop,wscale 7], length 0
> 0x0000:  4500 003c 0000 4000 4006 246a 0a02 0174  E..<..@.@.$j...t
> 0x0010:  0a02 00db 13ba ce7e c909 96e6 cf10 e081  .......~........
> 0x0020:  a012 68dc b904 0000 0204 2300 0402 080a  ..h.......#.....
> 0x0030:  33be 54e7 33ba 4455 0103 0307            3.T.3.DU....
> 05:01:35.549746 IP cnlvr02r04c31.mmcc > agent1.cncf.io.52862: Flags
> [.], ack 318, win 219, options [nop,nop,TS val 868111591 ecr
> 867845205], length 0
> 0x0000:  4500 0034 c200 4000 4006 6271 0a02 0174  E..4..@.@.bq...t
> 0x0010:  0a02 00db 13ba ce7e c909 96e7 cf10 e1be  .......~........
> 0x0020:  8010 00db 6be1 0000 0101 080a 33be 54e7  ....k.......3.T.
> 0x0030:  33ba 4455                                3.DU
> 05:01:35.780313 IP cnlvr02r04c31.mmcc > agent1.cncf.io.52862: Flags
> [.], ack 635, win 227, options [nop,nop,TS val 868111649 ecr
> 867845263], length 0
> 0x0000:  4500 0034 c201 4000 4006 6270 0a02 0174  E..4..@.@.bp...t
> 0x0010:  0a02 00db 13ba ce7e c909 96e7 cf10 e2fb  .......~........
> 0x0020:  8010 00e3 6a28 0000 0101 080a 33be 5521  ....j(......3.U!
> 0x0030:  33ba 448f                                3.D.
>
> but I can't see what's in the packet since it is an encoded protobuf
> message.
>
> How do you debug these kind of issues? What kind of tools do you use?
>
> Cheers,
>
> Frank
>



-- 
Best Regards,
Haosdent Huang