You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Ondrej Smola <on...@gmail.com> on 2015/01/19 14:04:31 UTC

Storm on Mesos - 3 Masters

Hi,

we have Mesos cluster installation - 3 masters (0.21.0), ZK (3.4.5) running
Mesos, Spark, Chronos, Marathon and Storm 0.9.3. All nodes running Ubuntu
14.04.

My problem is that i have to start MesosNimbus on currently elected leader,
otherwise MesosNimbus get stuck. From log i see it detects currently
leading master correctly but then get stuck. When leader changes to node
running nimbus it works again.

nimbus upstrart.log

I0119 12:20:03.289799 10728 detector.cpp:433] A new leading master (UPID=
master@192.168.56.11:5050) is detected
I0119 12:20:03.290081 10733 sched.cpp:234] New master detected at
master@192.168.56.11:5050
I0119 12:20:03.290592 10733 sched.cpp:242] No credentials provided.
Attempting to register without authentication

nimbus.log

2015-01-19T12:15:40.478+0100 o.m.log [DEBUG] started Server@20e1ceb3
2015-01-19T12:15:40.478+0100 s.m.MesosNimbus [INFO] Started serving config
dir under http://192.168.56.10:49202/conf
2015-01-19T12:15:40.535+0100 s.m.MesosNimbus [INFO] Waiting for scheduler
to initialize...

On leading mesos i see following log (repeated every second)

mesos.log

I0119 12:40:53.208027  4957 master.cpp:1520] Received re-registration
request from framework 20150119-114412-171485376-5050-6660-0002 (Storm
0.9.3) at scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
I0119 12:40:53.208860  4957 master.cpp:1573] Re-registering framework
20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3)  at
scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
I0119 12:40:53.209205  4957 master.cpp:1602] Framework
20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 failed over
I0119 12:40:53.211552  4957 hierarchical_allocator_process.hpp:375]
Activated framework 20150119-114412-171485376-5050-6660-0002
I0119 12:40:53.211932  4959 master.cpp:789] Framework
20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 disconnected
I0119 12:40:53.212004  4959 master.cpp:1752] Disconnecting framework
20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
I0119 12:40:53.212198  4959 master.cpp:1768] Deactivating framework
20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
I0119 12:40:53.212446  4959 master.cpp:811] Giving framework
20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 1hrs to
failover
I0119 12:40:53.212550  4959 hierarchical_allocator_process.hpp:405]
Deactivated framework 20150119-114412-171485376-5050-6660-0002
I0119 12:40:54.209858  4959 master.cpp:1520] Received re-registration
request from framework 20150119-114412-171485376-5050-6660-0002 (Storm
0.9.3) at scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310


Other frameworks works okay and handles leading masters on another node
correctly.
>From breef look at source code it hangs

https://github.com/mesos/storm/blob/master/src/storm/mesos/MesosNimbus.java
at line 153

when trying to acquire semaphore.


Thank you for your great job

Ondrej Smola

Re: Storm on Mesos - 3 Masters

Posted by Ondrej Smola <on...@gmail.com>.
Hi,
thank you for your reply - as you mentioned the issue was in Storm binding
to wrong interface in VirtualBox during testing of automated cluster setup.
After setup on bare metal testing cluster everything works correctly.

Thanks


2015-02-06 18:41 GMT+01:00 Tomas Barton <ba...@gmail.com>:

> Hi,
>
> sorry for late reply. I found the message accidentally in spam.
>
> It seems like Storm is binding to localhost 127.0.1.1:52310
> <http://scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310/> instead
> of using public interface.
>
> Regards,
> Tomas
>
>
> On 19 January 2015 at 14:04, Ondrej Smola <on...@gmail.com> wrote:
>
>> Hi,
>>
>> we have Mesos cluster installation - 3 masters (0.21.0), ZK (3.4.5)
>> running Mesos, Spark, Chronos, Marathon and Storm 0.9.3. All nodes running
>> Ubuntu 14.04.
>>
>> My problem is that i have to start MesosNimbus on currently elected
>> leader, otherwise MesosNimbus get stuck. From log i see it detects
>> currently leading master correctly but then get stuck. When leader changes
>> to node running nimbus it works again.
>>
>> nimbus upstrart.log
>>
>> I0119 12:20:03.289799 10728 detector.cpp:433] A new leading master (UPID=
>> master@192.168.56.11:5050) is detected
>> I0119 12:20:03.290081 10733 sched.cpp:234] New master detected at
>> master@192.168.56.11:5050
>> I0119 12:20:03.290592 10733 sched.cpp:242] No credentials provided.
>> Attempting to register without authentication
>>
>> nimbus.log
>>
>> 2015-01-19T12:15:40.478+0100 o.m.log [DEBUG] started Server@20e1ceb3
>> 2015-01-19T12:15:40.478+0100 s.m.MesosNimbus [INFO] Started serving
>> config dir under http://192.168.56.10:49202/conf
>> 2015-01-19T12:15:40.535+0100 s.m.MesosNimbus [INFO] Waiting for scheduler
>> to initialize...
>>
>> On leading mesos i see following log (repeated every second)
>>
>> mesos.log
>>
>> I0119 12:40:53.208027  4957 master.cpp:1520] Received re-registration
>> request from framework 20150119-114412-171485376-5050-6660-0002 (Storm
>> 0.9.3) at scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
>> I0119 12:40:53.208860  4957 master.cpp:1573] Re-registering framework
>> 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3)  at
>> scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
>> I0119 12:40:53.209205  4957 master.cpp:1602] Framework
>> 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
>> scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 failed
>> over
>> I0119 12:40:53.211552  4957 hierarchical_allocator_process.hpp:375]
>> Activated framework 20150119-114412-171485376-5050-6660-0002
>> I0119 12:40:53.211932  4959 master.cpp:789] Framework
>> 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
>> scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
>> disconnected
>> I0119 12:40:53.212004  4959 master.cpp:1752] Disconnecting framework
>> 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
>> scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
>> I0119 12:40:53.212198  4959 master.cpp:1768] Deactivating framework
>> 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
>> scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
>> I0119 12:40:53.212446  4959 master.cpp:811] Giving framework
>> 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
>> scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 1hrs to
>> failover
>> I0119 12:40:53.212550  4959 hierarchical_allocator_process.hpp:405]
>> Deactivated framework 20150119-114412-171485376-5050-6660-0002
>> I0119 12:40:54.209858  4959 master.cpp:1520] Received re-registration
>> request from framework 20150119-114412-171485376-5050-6660-0002 (Storm
>> 0.9.3) at scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
>>
>>
>> Other frameworks works okay and handles leading masters on another node
>> correctly.
>> From breef look at source code it hangs
>>
>>
>> https://github.com/mesos/storm/blob/master/src/storm/mesos/MesosNimbus.java
>> at line 153
>>
>> when trying to acquire semaphore.
>>
>>
>> Thank you for your great job
>>
>> Ondrej Smola
>>
>
>

Re: Storm on Mesos - 3 Masters

Posted by Tomas Barton <ba...@gmail.com>.
Hi,

sorry for late reply. I found the message accidentally in spam.

It seems like Storm is binding to localhost 127.0.1.1:52310
<http://scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310/>
instead
of using public interface.

Regards,
Tomas


On 19 January 2015 at 14:04, Ondrej Smola <on...@gmail.com> wrote:

> Hi,
>
> we have Mesos cluster installation - 3 masters (0.21.0), ZK (3.4.5)
> running Mesos, Spark, Chronos, Marathon and Storm 0.9.3. All nodes running
> Ubuntu 14.04.
>
> My problem is that i have to start MesosNimbus on currently elected
> leader, otherwise MesosNimbus get stuck. From log i see it detects
> currently leading master correctly but then get stuck. When leader changes
> to node running nimbus it works again.
>
> nimbus upstrart.log
>
> I0119 12:20:03.289799 10728 detector.cpp:433] A new leading master (UPID=
> master@192.168.56.11:5050) is detected
> I0119 12:20:03.290081 10733 sched.cpp:234] New master detected at
> master@192.168.56.11:5050
> I0119 12:20:03.290592 10733 sched.cpp:242] No credentials provided.
> Attempting to register without authentication
>
> nimbus.log
>
> 2015-01-19T12:15:40.478+0100 o.m.log [DEBUG] started Server@20e1ceb3
> 2015-01-19T12:15:40.478+0100 s.m.MesosNimbus [INFO] Started serving config
> dir under http://192.168.56.10:49202/conf
> 2015-01-19T12:15:40.535+0100 s.m.MesosNimbus [INFO] Waiting for scheduler
> to initialize...
>
> On leading mesos i see following log (repeated every second)
>
> mesos.log
>
> I0119 12:40:53.208027  4957 master.cpp:1520] Received re-registration
> request from framework 20150119-114412-171485376-5050-6660-0002 (Storm
> 0.9.3) at scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
> I0119 12:40:53.208860  4957 master.cpp:1573] Re-registering framework
> 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3)  at
> scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
> I0119 12:40:53.209205  4957 master.cpp:1602] Framework
> 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
> scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 failed over
> I0119 12:40:53.211552  4957 hierarchical_allocator_process.hpp:375]
> Activated framework 20150119-114412-171485376-5050-6660-0002
> I0119 12:40:53.211932  4959 master.cpp:789] Framework
> 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
> scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
> disconnected
> I0119 12:40:53.212004  4959 master.cpp:1752] Disconnecting framework
> 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
> scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
> I0119 12:40:53.212198  4959 master.cpp:1768] Deactivating framework
> 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
> scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
> I0119 12:40:53.212446  4959 master.cpp:811] Giving framework
> 20150119-114412-171485376-5050-6660-0002 (Storm 0.9.3) at
> scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310 1hrs to
> failover
> I0119 12:40:53.212550  4959 hierarchical_allocator_process.hpp:405]
> Deactivated framework 20150119-114412-171485376-5050-6660-0002
> I0119 12:40:54.209858  4959 master.cpp:1520] Received re-registration
> request from framework 20150119-114412-171485376-5050-6660-0002 (Storm
> 0.9.3) at scheduler-37d9a510-1136-4adb-be09-c9c2e388611f@127.0.1.1:52310
>
>
> Other frameworks works okay and handles leading masters on another node
> correctly.
> From breef look at source code it hangs
>
> https://github.com/mesos/storm/blob/master/src/storm/mesos/MesosNimbus.java
> at line 153
>
> when trying to acquire semaphore.
>
>
> Thank you for your great job
>
> Ondrej Smola
>