You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by Tim Chou <ti...@gmail.com> on 2014/10/01 07:46:05 UTC

Why I always get "No master is currently leading ..." at master:port 5050

Hi all,

I struggle with this problem for a while.

I think I have started mesos successfully. There're two node in my cluster.
I use one as master, the other as slave.

At master, I use "mesos-master.sh
--work_dir=/scratch/fzhou/git/mesos-0.20.0/workdir --ip=10.1.2.13" to start
the master.
At slave, I use "mesos-slave.sh --master=10.1.2.13:5050" to start it.

The logs in master and slave printed on the screen are shown in the end.

I also run the python-test provided in the getting started pages. It works
and shows:

However, each time I check the webpage at master:5050 via lynx. I always
get the message:
"No master is currently leading ...
   [BUTTON] ? This master is not the leader, redirecting in {{redirect /
1000}} seconds ... go now"

What's wrong with my mesos cluster?

Looking forward to your replies.

Thanks,
Tim


Python test results:
"...

accepting offer on ucs02 to start task 4

Task 4 is in state 1

Task 4 is in state 2

All tasks done, waiting for final framework message

Received message: 'data with a \x00 byte'

All tasks done, and all messages received, exiting

I1001 00:11:39.143679 13398 sched.cpp:747] Stopping framework
'20141001-000931-218235146-5050-13321-0000'
"

Master:

I1001 00:27:31.279867 13563 hierarchical_allocator_process.hpp:442] Added
slave 20140930-135338-201457930-5050-13549-0 (ucs02) with cpus(*):8;
mem(*):6845; disk(*):9642; ports(*):[31000-32000] (and cpus(*):8;
mem(*):6845; disk(*):9642; ports(*):[31000-32000] available)

I1001 00:27:31.287533 13569 leveldb.cpp:343] Persisting action (16 bytes)
to leveldb took 7.758244ms

I1001 00:27:31.287555 13569 replica.cpp:676] Persisted action at 8

I1001 00:27:31.287817 13565 replica.cpp:655] Replica received learned
notice for position 8

I1001 00:27:31.295857 13565 leveldb.cpp:343] Persisting action (18 bytes)
to leveldb took 8.020761ms

I1001 00:27:31.295896 13565 leveldb.cpp:401] Deleting ~2 keys from leveldb
took 18279ns

I1001 00:27:31.295909 13565 replica.cpp:676] Persisted action at 8

I1001 00:27:31.295920 13565 replica.cpp:661] Replica learned TRUNCATE
action at position 8

Slave:

I1001 00:32:31.668547 24652 status_update_manager.cpp:167] New master
detected at master@10.1.2.13:5050

I1001 00:32:31.668566 24651 slave.cpp:636] Detecting new master

I1001 00:32:32.484762 24651 slave.cpp:816] Re-registered with master
master@10.1.2.13:5050

I1001 00:33:31.677762 24648 slave.cpp:3050] Current usage 43.95%. Max
allowed age: 3.223409898738067days

I1001 00:34:31.692034 24649 slave.cpp:3050] Current usage 43.95%. Max
allowed age: 3.223409898738067days

I1001 00:35:31.705721 24647 slave.cpp:3050] Current usage 43.95%. Max
allowed age: 3.223409898738067days

Re: Why I always get "No master is currently leading ..." at master:port 5050

Posted by Tim Chou <ti...@gmail.com>.

Hi Jiang,

You're right. This morning I went to the server room, connect a
monitor, opened my browser, and check the mesos-master:5050.
I found the cluster information that I want.

Thank you so much,
Tim

2014-10-01 1:46 GMT-05:00 Yan Xu <ya...@jxu.me>:

> Your tests were successful so the master and slave should be working
> correctly. What you saw with lynx is just some hidden text which you won't
> see with a GUI web browser.
>
> --
> Jiang Yan Xu <ya...@jxu.me> @xujyan <http://twitter.com/xujyan>
>
> On Tue, Sep 30, 2014 at 10:46 PM, Tim Chou <ti...@gmail.com> wrote:
>
>> Hi all,
>>
>> I struggle with this problem for a while.
>>
>> I think I have started mesos successfully. There're two node in my
>> cluster. I use one as master, the other as slave.
>>
>> At master, I use "mesos-master.sh
>> --work_dir=/scratch/fzhou/git/mesos-0.20.0/workdir --ip=10.1.2.13" to start
>> the master.
>> At slave, I use "mesos-slave.sh --master=10.1.2.13:5050" to start it.
>>
>> The logs in master and slave printed on the screen are shown in the end.
>>
>> I also run the python-test provided in the getting started pages. It
>> works and shows:
>>
>> However, each time I check the webpage at master:5050 via lynx. I always
>> get the message:
>> "No master is currently leading ...
>>    [BUTTON] ? This master is not the leader, redirecting in {{redirect /
>> 1000}} seconds ... go now"
>>
>> What's wrong with my mesos cluster?
>>
>> Looking forward to your replies.
>>
>> Thanks,
>> Tim
>>
>>
>> Python test results:
>> "...
>>
>> accepting offer on ucs02 to start task 4
>>
>> Task 4 is in state 1
>>
>> Task 4 is in state 2
>>
>> All tasks done, waiting for final framework message
>>
>> Received message: 'data with a \x00 byte'
>>
>> All tasks done, and all messages received, exiting
>>
>> I1001 00:11:39.143679 13398 sched.cpp:747] Stopping framework
>> '20141001-000931-218235146-5050-13321-0000'
>> "
>>
>> Master:
>>
>> I1001 00:27:31.279867 13563 hierarchical_allocator_process.hpp:442] Added
>> slave 20140930-135338-201457930-5050-13549-0 (ucs02) with cpus(*):8;
>> mem(*):6845; disk(*):9642; ports(*):[31000-32000] (and cpus(*):8;
>> mem(*):6845; disk(*):9642; ports(*):[31000-32000] available)
>>
>> I1001 00:27:31.287533 13569 leveldb.cpp:343] Persisting action (16 bytes)
>> to leveldb took 7.758244ms
>>
>> I1001 00:27:31.287555 13569 replica.cpp:676] Persisted action at 8
>>
>> I1001 00:27:31.287817 13565 replica.cpp:655] Replica received learned
>> notice for position 8
>>
>> I1001 00:27:31.295857 13565 leveldb.cpp:343] Persisting action (18 bytes)
>> to leveldb took 8.020761ms
>>
>> I1001 00:27:31.295896 13565 leveldb.cpp:401] Deleting ~2 keys from
>> leveldb took 18279ns
>>
>> I1001 00:27:31.295909 13565 replica.cpp:676] Persisted action at 8
>>
>> I1001 00:27:31.295920 13565 replica.cpp:661] Replica learned TRUNCATE
>> action at position 8
>>
>> Slave:
>>
>> I1001 00:32:31.668547 24652 status_update_manager.cpp:167] New master
>> detected at master@10.1.2.13:5050
>>
>> I1001 00:32:31.668566 24651 slave.cpp:636] Detecting new master
>>
>> I1001 00:32:32.484762 24651 slave.cpp:816] Re-registered with master
>> master@10.1.2.13:5050
>>
>> I1001 00:33:31.677762 24648 slave.cpp:3050] Current usage 43.95%. Max
>> allowed age: 3.223409898738067days
>>
>> I1001 00:34:31.692034 24649 slave.cpp:3050] Current usage 43.95%. Max
>> allowed age: 3.223409898738067days
>>
>> I1001 00:35:31.705721 24647 slave.cpp:3050] Current usage 43.95%. Max
>> allowed age: 3.223409898738067days
>>
>
>

Re: Why I always get "No master is currently leading ..." at master:port 5050

Posted by Yan Xu <ya...@jxu.me>.

Your tests were successful so the master and slave should be working
correctly. What you saw with lynx is just some hidden text which you won't
see with a GUI web browser.

--
Jiang Yan Xu <ya...@jxu.me> @xujyan <http://twitter.com/xujyan>

On Tue, Sep 30, 2014 at 10:46 PM, Tim Chou <ti...@gmail.com> wrote:

> Hi all,
>
> I struggle with this problem for a while.
>
> I think I have started mesos successfully. There're two node in my
> cluster. I use one as master, the other as slave.
>
> At master, I use "mesos-master.sh
> --work_dir=/scratch/fzhou/git/mesos-0.20.0/workdir --ip=10.1.2.13" to start
> the master.
> At slave, I use "mesos-slave.sh --master=10.1.2.13:5050" to start it.
>
> The logs in master and slave printed on the screen are shown in the end.
>
> I also run the python-test provided in the getting started pages. It works
> and shows:
>
> However, each time I check the webpage at master:5050 via lynx. I always
> get the message:
> "No master is currently leading ...
>    [BUTTON] ? This master is not the leader, redirecting in {{redirect /
> 1000}} seconds ... go now"
>
> What's wrong with my mesos cluster?
>
> Looking forward to your replies.
>
> Thanks,
> Tim
>
>
> Python test results:
> "...
>
> accepting offer on ucs02 to start task 4
>
> Task 4 is in state 1
>
> Task 4 is in state 2
>
> All tasks done, waiting for final framework message
>
> Received message: 'data with a \x00 byte'
>
> All tasks done, and all messages received, exiting
>
> I1001 00:11:39.143679 13398 sched.cpp:747] Stopping framework
> '20141001-000931-218235146-5050-13321-0000'
> "
>
> Master:
>
> I1001 00:27:31.279867 13563 hierarchical_allocator_process.hpp:442] Added
> slave 20140930-135338-201457930-5050-13549-0 (ucs02) with cpus(*):8;
> mem(*):6845; disk(*):9642; ports(*):[31000-32000] (and cpus(*):8;
> mem(*):6845; disk(*):9642; ports(*):[31000-32000] available)
>
> I1001 00:27:31.287533 13569 leveldb.cpp:343] Persisting action (16 bytes)
> to leveldb took 7.758244ms
>
> I1001 00:27:31.287555 13569 replica.cpp:676] Persisted action at 8
>
> I1001 00:27:31.287817 13565 replica.cpp:655] Replica received learned
> notice for position 8
>
> I1001 00:27:31.295857 13565 leveldb.cpp:343] Persisting action (18 bytes)
> to leveldb took 8.020761ms
>
> I1001 00:27:31.295896 13565 leveldb.cpp:401] Deleting ~2 keys from leveldb
> took 18279ns
>
> I1001 00:27:31.295909 13565 replica.cpp:676] Persisted action at 8
>
> I1001 00:27:31.295920 13565 replica.cpp:661] Replica learned TRUNCATE
> action at position 8
>
> Slave:
>
> I1001 00:32:31.668547 24652 status_update_manager.cpp:167] New master
> detected at master@10.1.2.13:5050
>
> I1001 00:32:31.668566 24651 slave.cpp:636] Detecting new master
>
> I1001 00:32:32.484762 24651 slave.cpp:816] Re-registered with master
> master@10.1.2.13:5050
>
> I1001 00:33:31.677762 24648 slave.cpp:3050] Current usage 43.95%. Max
> allowed age: 3.223409898738067days
>
> I1001 00:34:31.692034 24649 slave.cpp:3050] Current usage 43.95%. Max
> allowed age: 3.223409898738067days
>
> I1001 00:35:31.705721 24647 slave.cpp:3050] Current usage 43.95%. Max
> allowed age: 3.223409898738067days
>