You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by Stefano Bianchi <ja...@gmail.com> on 2016/04/14 15:09:27 UTC

Zookeeper mesos-master on different network

Hi all
i'm working on OpenStack and i have build come virtual machines and 2
different networks with it.
I have set two mesos clusters:

NetworkA:
2 mesos master
2 mesos slaves

NetworkB:
1 mesos master
1 mesos slave

I should try to make and interconnection between these two clusters.

I have set zookeeper configurations such that all 3 masters are competing
for he leadership. I show you the main configurations:

NetworkA on both 2 masters:

*/etc/zookeeper/conf/zoo.cfg *: at the end of the file

server.1=192.168.100.54:2888:3888 (master1 on network A)

server.2=192.168.100.55:2888:3888 (master2 on network A)

server.3=131.154.xxx.xxx:2888:3888 (Master3 on network B, i have set
floating IP)

*etc/mesos/zk*

zk://192.168.100.54:2181,192.168.100.55:2181,131.154.xxx.xxx:2181/mesos

NetorkB:

*/etc/zookeeper/conf/zoo.cfg: at the end of the file:*

server.1=131.154.96.27:2888:3888 (master1 on network A, i have set floating
IP)

server.2=131.154.96.32:2888:3888 (master2 on network A, i have set floating
IP)

server.3=192.168.10.11:2888:3888 (Master3 on network B)


*etc/mesos/zk:*

zk://131.154.zzz.zzz:2181,131.154.yyy.yyy:2181,192.168.10.11:2181/mesos


the 3 masters seems to work fine, if i stop mesos-master service on one of
them, there is the rielection, so they are behaving as one single cluster
with 3 masters.
I have no problems with masters, but with slaves.
I have currenty set up slaves setting the /etc/mesos/zk exactly as i shown
above in a coherent way.

Now the leader s one master which is on the Network A, and only the slaves
on Network A can connect to it, but i need to connect also the slave on the
other network.
Do you have suggestions?

Re: Zookeeper mesos-master on different network

Posted by Stefano Bianchi <ja...@gmail.com>.

My mistake:

"if i have leader master on Net1 it is able to dispatch a task on SLAVE 3
on Net2."

2016-04-26 10:59 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:

> I finally found a solution.
> On openstack i designed this topology:
>
> -----------------------internet-----------------------
>                                 |
>                            Router1
>                                 |
> --------------------------------------------------------
> |                                                                 |
> Net1                                                        Net2
> Master1 Master2                                     Master3
> Slave1 slave2                                          Slave3
>
> It is a simplified view, but in this way all the masters and agents are
> reachable through unique hostname, so i am able to set zookeeper uniformly.
> This topology works fine, meaning that if i have leader master on Net1 it
> is able to dispatch a task on Master 3 on Net2.
>
>
> 2016-04-14 19:04 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>
>> However quorum = 1 does not change anything. I guess that i beed to
>> implement a DNS.
>> Il 14/apr/2016 17:42, "Stefano Bianchi" <ja...@gmail.com> ha
>> scritto:
>>
>>> i don't know why, but setting quorum to 1 on each master i don't have
>>> fluctuating election continuously, i don't know if it could be the right
>>> solution.
>>> I tired to turn off one of the 2 masters on NetworkA, it goes down but
>>> rielection start between the other master on network A and the master on
>>> network B.
>>> Now the only one problem i have is that, if one of the 2 masters on
>>> network A are leading, only slaves on that network are atteched to it.
>>> On the contrary, if the master of network B is leading only the slave on
>>> that network is attached. How can i resolve this ?
>>> I would like for instance that when Master on Network B is leading, all
>>> the 3 slaves, so the one on the same network and 2 on the other network,
>>> are "attached" to that master.
>>> Do you have any suggestion?
>>>
>>> 2016-04-14 16:49 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>>
>>>> this is the log:
>>>>
>>>> Log file created at: 2016/04/14 14:48:26
>>>> Running on machine: master3.novalocal
>>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>>>> I0414 14:48:26.415572 19956 logging.cpp:188] INFO level logging started!
>>>> I0414 14:48:26.416097 19956 main.cpp:230] Build: 2016-03-10 20:32:58 by root
>>>> I0414 14:48:26.416121 19956 main.cpp:232] Version: 0.27.2
>>>> I0414 14:48:26.416133 19956 main.cpp:235] Git tag: 0.27.2
>>>> I0414 14:48:26.416146 19956 main.cpp:239] Git SHA: 3c9ec4a0f34420b7803848af597de00fedefe0e2
>>>> I0414 14:48:26.416205 19956 main.cpp:253] Using 'HierarchicalDRF' allocator
>>>> I0414 14:48:26.448494 19956 leveldb.cpp:174] Opened db in 32.174282ms
>>>> I0414 14:48:26.477005 19956 leveldb.cpp:181] Compacted db in 28.458808ms
>>>> I0414 14:48:26.477056 19956 leveldb.cpp:196] Created db iterator in 9749ns
>>>> I0414 14:48:26.477097 19956 leveldb.cpp:202] Seeked to beginning of db in 20828ns
>>>> I0414 14:48:26.477110 19956 leveldb.cpp:271] Iterated through 0 keys in the db in 596ns
>>>> I0414 14:48:26.477164 19956 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
>>>> I0414 14:48:26.478237 19956 main.cpp:464] Starting Mesos master
>>>> I0414 14:48:26.479887 19956 master.cpp:374] Master 51d6efb6-7611-4b4e-9118-ff7493889545 (131.154.96.156) started on 192.168.10.11:5050
>>>> I0414 14:48:26.480388 19973 log.cpp:236] Attempting to join replica to ZooKeeper group
>>>> I0414 14:48:26.482223 19977 recover.cpp:447] Starting replica recovery
>>>> I0414 14:48:26.479909 19956 master.cpp:376] Flags at startup: --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="false" --authenticate_http="false" --authenticate_slaves="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname="131.154.96.156" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" --port="5050" --quiet="false" --quorum="2" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" --registry_strict="false" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/var/lib/mesos" --zk="zk://131.154.96.27:2181,131.154.96.32:2181,192.168.10.11:2181/mesos" --zk_session_timeout="10secs"
>>>> I0414 14:48:26.483753 19956 master.cpp:423] Master allowing unauthenticated frameworks to register
>>>> I0414 14:48:26.483772 19956 master.cpp:428] Master allowing unauthenticated slaves to register
>>>> I0414 14:48:26.483789 19956 master.cpp:466] Using default 'crammd5' authenticator
>>>> W0414 14:48:26.483810 19956 authenticator.cpp:511] No credentials provided, authentication requests will be refused
>>>> I0414 14:48:26.484066 19956 authenticator.cpp:518] Initializing server SASL
>>>> I0414 14:48:26.495026 19978 recover.cpp:473] Replica is in EMPTY status
>>>> I0414 14:48:26.498484 19976 master.cpp:1649] Successfully attached file '/var/log/mesos/mesos-master.INFO'
>>>> I0414 14:48:26.498517 19976 contender.cpp:147] Joining the ZK group
>>>> I0414 14:48:26.527865 19972 group.cpp:349] Group process (group(1)@192.168.10.11:5050) connected to ZooKeeper
>>>> I0414 14:48:26.527930 19972 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
>>>> I0414 14:48:26.527954 19972 group.cpp:427] Trying to create path '/mesos/log_replicas' in ZooKeeper
>>>> I0414 14:48:26.528306 19976 group.cpp:349] Group process (group(4)@192.168.10.11:5050) connected to ZooKeeper
>>>> I0414 14:48:26.528364 19976 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
>>>> I0414 14:48:26.528424 19976 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
>>>> I0414 14:48:26.528740 19971 group.cpp:349] Group process (group(2)@192.168.10.11:5050) connected to ZooKeeper
>>>> I0414 14:48:26.528771 19971 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
>>>> I0414 14:48:26.528805 19971 group.cpp:427] Trying to create path '/mesos/log_replicas' in ZooKeeper
>>>> I0414 14:48:26.534221 19972 network.hpp:413] ZooKeeper group memberships changed
>>>> I0414 14:48:26.534343 19972 group.cpp:700] Trying to get '/mesos/log_replicas/0000000054' in ZooKeeper
>>>> I0414 14:48:26.534713 19976 detector.cpp:154] Detected a new leader: (id='57')
>>>> I0414 14:48:26.534843 19976 group.cpp:700] Trying to get '/mesos/json.info_0000000057' in ZooKeeper
>>>> I0414 14:48:26.536515 19973 group.cpp:349] Group process (group(3)@192.168.10.11:5050) connected to ZooKeeper
>>>> I0414 14:48:26.536546 19973 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
>>>> I0414 14:48:26.536559 19973 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
>>>> I0414 14:48:26.541244 19972 network.hpp:461] ZooKeeper group PIDs: { log-replica(1)@192.168.100.54:5050 }
>>>> I0414 14:48:26.541806 19972 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (5)@192.168.10.11:5050
>>>> I0414 14:48:26.541893 19972 recover.cpp:193] Received a recover response from a replica in EMPTY status
>>>> I0414 14:48:26.542330 19976 detector.cpp:479] A new leading master (UPID=master@192.168.100.54:5050) is detected
>>>> I0414 14:48:26.542408 19976 master.cpp:1710] The newly elected leader is master@192.168.100.54:5050 with id b6031dea-c621-4ba1-9254-87b7449e0d08
>>>> I0414 14:48:26.555027 19976 network.hpp:413] ZooKeeper group memberships changed
>>>> I0414 14:48:26.555173 19976 group.cpp:700] Trying to get '/mesos/log_replicas/0000000054' in ZooKeeper
>>>> I0414 14:48:26.556934 19976 group.cpp:700] Trying to get '/mesos/log_replicas/0000000055' in ZooKeeper
>>>> I0414 14:48:26.558343 19976 network.hpp:461] ZooKeeper group PIDs: { log-replica(1)@192.168.10.11:5050, log-replica(1)@192.168.100.54:5050 }
>>>> I0414 14:48:26.562963 19971 contender.cpp:263] New candidate (id='58') has entered the contest for leadership
>>>> I0414 14:48:36.496371 19979 recover.cpp:109] Unable to finish the recover protocol in 10secs, retrying
>>>> I0414 14:48:36.496866 19971 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (10)@192.168.10.11:5050
>>>> I0414 14:48:36.496919 19971 recover.cpp:193] Received a recover response from a replica in EMPTY status
>>>> I0414 14:48:36.963434 19971 http.cpp:501] HTTP GET for /master/state.json from 131.154.5.22:59267 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36 OPR/36.0.2130.46'
>>>> I0414 14:48:46.497448 19979 recover.cpp:109] Unable to finish the recover protocol in 10secs, retrying
>>>> I0414 14:48:46.498134 19971 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (15)@192.168.10.11:5050
>>>> I0414 14:48:46.498247 19971 recover.cpp:193] Received a recover response from a replica in EMPTY status
>>>> I0414 14:48:56.498900 19979 recover.cpp:109] Unable to finish the recover protocol in 10secs, retrying
>>>> I0414 14:48:56.499447 19971 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (17)@192.168.10.11:5050
>>>> I0414 14:48:56.499526 19971 recover.cpp:193] Received a recover response from a replica in EMPTY status
>>>>
>>>>
>>>> 2016-04-14 16:27 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>>>
>>>>> However now i perceive a problem with masters.
>>>>> If i turn off one master on Network A the the master on network B is
>>>>> elected but after a minute it will disconnect, coming back to the original
>>>>> one.
>>>>>
>>>>> 2016-04-14 16:26 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>>>>
>>>>>> on openstack security group the ssh port is open.
>>>>>>
>>>>>>
>>>>>> 2016-04-14 16:24 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
>>>>>>
>>>>>>> Is it an indication that the SSH port is open and the others aren't?
>>>>>>>
>>>>>>> -Flavio
>>>>>>>
>>>>>>> > On 14 Apr 2016, at 15:10, Stefano Bianchi <ja...@gmail.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > I tried with telnet and i have connection timed out, but i am able
>>>>>>> to
>>>>>>> > connect trough SSH
>>>>>>> >
>>>>>>> > 2016-04-14 16:05 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>>>>>> >
>>>>>>> >> Thanks for your reply Flavio.
>>>>>>> >> Actually, i don't have a DNS, so i am foced to type hosts file,
>>>>>>> in which i
>>>>>>> >> have set all the IP addrsses.
>>>>>>> >> Of course for the note in Network B i have set the Floating IP of
>>>>>>> the
>>>>>>> >> other 2 slaves in network A associated to their hostname.
>>>>>>> Actually i don't
>>>>>>> >> know if it is correct, but at least if i make a ping from the
>>>>>>> slave in
>>>>>>> >> Network B to a slave in A i obtain replies. and vice versa.
>>>>>>> >>
>>>>>>> >> 2016-04-14 15:55 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
>>>>>>> >>
>>>>>>> >>> Have you made sure that a slave in net B is able to telnet or
>>>>>>> ssh to the
>>>>>>> >>> leader machine in net A? Is it possible that the client port is
>>>>>>> blocker
>>>>>>> >>> from B to A?
>>>>>>> >>>
>>>>>>> >>> -Flavio
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>> On 14 Apr 2016, at 14:09, Stefano Bianchi <ja...@gmail.com>
>>>>>>> wrote:
>>>>>>> >>>>
>>>>>>> >>>> Hi all
>>>>>>> >>>> i'm working on OpenStack and i have build come virtual machines
>>>>>>> and 2
>>>>>>> >>>> different networks with it.
>>>>>>> >>>> I have set two mesos clusters:
>>>>>>> >>>>
>>>>>>> >>>> NetworkA:
>>>>>>> >>>> 2 mesos master
>>>>>>> >>>> 2 mesos slaves
>>>>>>> >>>>
>>>>>>> >>>> NetworkB:
>>>>>>> >>>> 1 mesos master
>>>>>>> >>>> 1 mesos slave
>>>>>>> >>>>
>>>>>>> >>>> I should try to make and interconnection between these two
>>>>>>> clusters.
>>>>>>> >>>>
>>>>>>> >>>> I have set zookeeper configurations such that all 3 masters are
>>>>>>> >>> competing
>>>>>>> >>>> for he leadership. I show you the main configurations:
>>>>>>> >>>>
>>>>>>> >>>> NetworkA on both 2 masters:
>>>>>>> >>>>
>>>>>>> >>>> */etc/zookeeper/conf/zoo.cfg *: at the end of the file
>>>>>>> >>>>
>>>>>>> >>>> server.1=192.168.100.54:2888:3888 (master1 on network A)
>>>>>>> >>>>
>>>>>>> >>>> server.2=192.168.100.55:2888:3888 (master2 on network A)
>>>>>>> >>>>
>>>>>>> >>>> server.3=131.154.xxx.xxx:2888:3888 (Master3 on network B, i
>>>>>>> have set
>>>>>>> >>>> floating IP)
>>>>>>> >>>>
>>>>>>> >>>> *etc/mesos/zk*
>>>>>>> >>>>
>>>>>>> >>>> zk://192.168.100.54:2181,192.168.100.55:2181
>>>>>>> ,131.154.xxx.xxx:2181/mesos
>>>>>>> >>>>
>>>>>>> >>>> NetorkB:
>>>>>>> >>>>
>>>>>>> >>>> */etc/zookeeper/conf/zoo.cfg: at the end of the file:*
>>>>>>> >>>>
>>>>>>> >>>> server.1=131.154.96.27:2888:3888 (master1 on network A, i have
>>>>>>> set
>>>>>>> >>> floating
>>>>>>> >>>> IP)
>>>>>>> >>>>
>>>>>>> >>>> server.2=131.154.96.32:2888:3888 (master2 on network A, i have
>>>>>>> set
>>>>>>> >>> floating
>>>>>>> >>>> IP)
>>>>>>> >>>>
>>>>>>> >>>> server.3=192.168.10.11:2888:3888 (Master3 on network B)
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> *etc/mesos/zk:*
>>>>>>> >>>>
>>>>>>> >>>> zk://131.154.zzz.zzz:2181,131.154.yyy.yyy:2181,
>>>>>>> 192.168.10.11:2181/mesos
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> the 3 masters seems to work fine, if i stop mesos-master
>>>>>>> service on one
>>>>>>> >>> of
>>>>>>> >>>> them, there is the rielection, so they are behaving as one
>>>>>>> single
>>>>>>> >>> cluster
>>>>>>> >>>> with 3 masters.
>>>>>>> >>>> I have no problems with masters, but with slaves.
>>>>>>> >>>> I have currenty set up slaves setting the /etc/mesos/zk exactly
>>>>>>> as i
>>>>>>> >>> shown
>>>>>>> >>>> above in a coherent way.
>>>>>>> >>>>
>>>>>>> >>>> Now the leader s one master which is on the Network A, and only
>>>>>>> the
>>>>>>> >>> slaves
>>>>>>> >>>> on Network A can connect to it, but i need to connect also the
>>>>>>> slave on
>>>>>>> >>> the
>>>>>>> >>>> other network.
>>>>>>> >>>> Do you have suggestions?
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Re: Zookeeper mesos-master on different network

Posted by will martin <wm...@outlook.com>.

nice

> On Apr 26, 2016, at 4:59 AM, Stefano Bianchi <ja...@gmail.com> wrote:
> 
> I finally found a solution.
> On openstack i designed this topology:
> 
> -----------------------internet-----------------------
>                                |
>                           Router1
>                                |
> --------------------------------------------------------
> |                                                                 |
> Net1                                                        Net2
> Master1 Master2                                     Master3
> Slave1 slave2                                          Slave3
> 
> It is a simplified view, but in this way all the masters and agents are
> reachable through unique hostname, so i am able to set zookeeper uniformly.
> This topology works fine, meaning that if i have leader master on Net1 it
> is able to dispatch a task on Master 3 on Net2.
> 
> 
> 2016-04-14 19:04 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
> 
>> However quorum = 1 does not change anything. I guess that i beed to
>> implement a DNS.
>> Il 14/apr/2016 17:42, "Stefano Bianchi" <ja...@gmail.com> ha scritto:
>> 
>>> i don't know why, but setting quorum to 1 on each master i don't have
>>> fluctuating election continuously, i don't know if it could be the right
>>> solution.
>>> I tired to turn off one of the 2 masters on NetworkA, it goes down but
>>> rielection start between the other master on network A and the master on
>>> network B.
>>> Now the only one problem i have is that, if one of the 2 masters on
>>> network A are leading, only slaves on that network are atteched to it.
>>> On the contrary, if the master of network B is leading only the slave on
>>> that network is attached. How can i resolve this ?
>>> I would like for instance that when Master on Network B is leading, all
>>> the 3 slaves, so the one on the same network and 2 on the other network,
>>> are "attached" to that master.
>>> Do you have any suggestion?
>>> 
>>> 2016-04-14 16:49 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>> 
>>>> this is the log:
>>>> 
>>>> Log file created at: 2016/04/14 14:48:26
>>>> Running on machine: master3.novalocal
>>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>>>> I0414 14:48:26.415572 19956 logging.cpp:188] INFO level logging started!
>>>> I0414 14:48:26.416097 19956 main.cpp:230] Build: 2016-03-10 20:32:58 by root
>>>> I0414 14:48:26.416121 19956 main.cpp:232] Version: 0.27.2
>>>> I0414 14:48:26.416133 19956 main.cpp:235] Git tag: 0.27.2
>>>> I0414 14:48:26.416146 19956 main.cpp:239] Git SHA: 3c9ec4a0f34420b7803848af597de00fedefe0e2
>>>> I0414 14:48:26.416205 19956 main.cpp:253] Using 'HierarchicalDRF' allocator
>>>> I0414 14:48:26.448494 19956 leveldb.cpp:174] Opened db in 32.174282ms
>>>> I0414 14:48:26.477005 19956 leveldb.cpp:181] Compacted db in 28.458808ms
>>>> I0414 14:48:26.477056 19956 leveldb.cpp:196] Created db iterator in 9749ns
>>>> I0414 14:48:26.477097 19956 leveldb.cpp:202] Seeked to beginning of db in 20828ns
>>>> I0414 14:48:26.477110 19956 leveldb.cpp:271] Iterated through 0 keys in the db in 596ns
>>>> I0414 14:48:26.477164 19956 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
>>>> I0414 14:48:26.478237 19956 main.cpp:464] Starting Mesos master
>>>> I0414 14:48:26.479887 19956 master.cpp:374] Master 51d6efb6-7611-4b4e-9118-ff7493889545 (131.154.96.156) started on 192.168.10.11:5050
>>>> I0414 14:48:26.480388 19973 log.cpp:236] Attempting to join replica to ZooKeeper group
>>>> I0414 14:48:26.482223 19977 recover.cpp:447] Starting replica recovery
>>>> I0414 14:48:26.479909 19956 master.cpp:376] Flags at startup: --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="false" --authenticate_http="false" --authenticate_slaves="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname="131.154.96.156" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" --port="5050" --quiet="false" --quorum="2" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" --registry_strict="false" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/var/lib/mesos" --zk="zk://131.154.96.27:2181,131.154.96.32:2181,192.168.10.11:2181/mesos" --zk_session_timeout="10secs"
>>>> I0414 14:48:26.483753 19956 master.cpp:423] Master allowing unauthenticated frameworks to register
>>>> I0414 14:48:26.483772 19956 master.cpp:428] Master allowing unauthenticated slaves to register
>>>> I0414 14:48:26.483789 19956 master.cpp:466] Using default 'crammd5' authenticator
>>>> W0414 14:48:26.483810 19956 authenticator.cpp:511] No credentials provided, authentication requests will be refused
>>>> I0414 14:48:26.484066 19956 authenticator.cpp:518] Initializing server SASL
>>>> I0414 14:48:26.495026 19978 recover.cpp:473] Replica is in EMPTY status
>>>> I0414 14:48:26.498484 19976 master.cpp:1649] Successfully attached file '/var/log/mesos/mesos-master.INFO'
>>>> I0414 14:48:26.498517 19976 contender.cpp:147] Joining the ZK group
>>>> I0414 14:48:26.527865 19972 group.cpp:349] Group process (group(1)@192.168.10.11:5050) connected to ZooKeeper
>>>> I0414 14:48:26.527930 19972 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
>>>> I0414 14:48:26.527954 19972 group.cpp:427] Trying to create path '/mesos/log_replicas' in ZooKeeper
>>>> I0414 14:48:26.528306 19976 group.cpp:349] Group process (group(4)@192.168.10.11:5050) connected to ZooKeeper
>>>> I0414 14:48:26.528364 19976 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
>>>> I0414 14:48:26.528424 19976 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
>>>> I0414 14:48:26.528740 19971 group.cpp:349] Group process (group(2)@192.168.10.11:5050) connected to ZooKeeper
>>>> I0414 14:48:26.528771 19971 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
>>>> I0414 14:48:26.528805 19971 group.cpp:427] Trying to create path '/mesos/log_replicas' in ZooKeeper
>>>> I0414 14:48:26.534221 19972 network.hpp:413] ZooKeeper group memberships changed
>>>> I0414 14:48:26.534343 19972 group.cpp:700] Trying to get '/mesos/log_replicas/0000000054' in ZooKeeper
>>>> I0414 14:48:26.534713 19976 detector.cpp:154] Detected a new leader: (id='57')
>>>> I0414 14:48:26.534843 19976 group.cpp:700] Trying to get '/mesos/json.info_0000000057' in ZooKeeper
>>>> I0414 14:48:26.536515 19973 group.cpp:349] Group process (group(3)@192.168.10.11:5050) connected to ZooKeeper
>>>> I0414 14:48:26.536546 19973 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
>>>> I0414 14:48:26.536559 19973 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
>>>> I0414 14:48:26.541244 19972 network.hpp:461] ZooKeeper group PIDs: { log-replica(1)@192.168.100.54:5050 }
>>>> I0414 14:48:26.541806 19972 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (5)@192.168.10.11:5050
>>>> I0414 14:48:26.541893 19972 recover.cpp:193] Received a recover response from a replica in EMPTY status
>>>> I0414 14:48:26.542330 19976 detector.cpp:479] A new leading master (UPID=master@192.168.100.54:5050) is detected
>>>> I0414 14:48:26.542408 19976 master.cpp:1710] The newly elected leader is master@192.168.100.54:5050 with id b6031dea-c621-4ba1-9254-87b7449e0d08
>>>> I0414 14:48:26.555027 19976 network.hpp:413] ZooKeeper group memberships changed
>>>> I0414 14:48:26.555173 19976 group.cpp:700] Trying to get '/mesos/log_replicas/0000000054' in ZooKeeper
>>>> I0414 14:48:26.556934 19976 group.cpp:700] Trying to get '/mesos/log_replicas/0000000055' in ZooKeeper
>>>> I0414 14:48:26.558343 19976 network.hpp:461] ZooKeeper group PIDs: { log-replica(1)@192.168.10.11:5050, log-replica(1)@192.168.100.54:5050 }
>>>> I0414 14:48:26.562963 19971 contender.cpp:263] New candidate (id='58') has entered the contest for leadership
>>>> I0414 14:48:36.496371 19979 recover.cpp:109] Unable to finish the recover protocol in 10secs, retrying
>>>> I0414 14:48:36.496866 19971 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (10)@192.168.10.11:5050
>>>> I0414 14:48:36.496919 19971 recover.cpp:193] Received a recover response from a replica in EMPTY status
>>>> I0414 14:48:36.963434 19971 http.cpp:501] HTTP GET for /master/state.json from 131.154.5.22:59267 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36 OPR/36.0.2130.46'
>>>> I0414 14:48:46.497448 19979 recover.cpp:109] Unable to finish the recover protocol in 10secs, retrying
>>>> I0414 14:48:46.498134 19971 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (15)@192.168.10.11:5050
>>>> I0414 14:48:46.498247 19971 recover.cpp:193] Received a recover response from a replica in EMPTY status
>>>> I0414 14:48:56.498900 19979 recover.cpp:109] Unable to finish the recover protocol in 10secs, retrying
>>>> I0414 14:48:56.499447 19971 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (17)@192.168.10.11:5050
>>>> I0414 14:48:56.499526 19971 recover.cpp:193] Received a recover response from a replica in EMPTY status
>>>> 
>>>> 
>>>> 2016-04-14 16:27 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>>> 
>>>>> However now i perceive a problem with masters.
>>>>> If i turn off one master on Network A the the master on network B is
>>>>> elected but after a minute it will disconnect, coming back to the original
>>>>> one.
>>>>> 
>>>>> 2016-04-14 16:26 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>>>> 
>>>>>> on openstack security group the ssh port is open.
>>>>>> 
>>>>>> 
>>>>>> 2016-04-14 16:24 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
>>>>>> 
>>>>>>> Is it an indication that the SSH port is open and the others aren't?
>>>>>>> 
>>>>>>> -Flavio
>>>>>>> 
>>>>>>>> On 14 Apr 2016, at 15:10, Stefano Bianchi <ja...@gmail.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> I tried with telnet and i have connection timed out, but i am able
>>>>>>> to
>>>>>>>> connect trough SSH
>>>>>>>> 
>>>>>>>> 2016-04-14 16:05 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>>>>>>> 
>>>>>>>>> Thanks for your reply Flavio.
>>>>>>>>> Actually, i don't have a DNS, so i am foced to type hosts file, in
>>>>>>> which i
>>>>>>>>> have set all the IP addrsses.
>>>>>>>>> Of course for the note in Network B i have set the Floating IP of
>>>>>>> the
>>>>>>>>> other 2 slaves in network A associated to their hostname. Actually
>>>>>>> i don't
>>>>>>>>> know if it is correct, but at least if i make a ping from the
>>>>>>> slave in
>>>>>>>>> Network B to a slave in A i obtain replies. and vice versa.
>>>>>>>>> 
>>>>>>>>> 2016-04-14 15:55 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
>>>>>>>>> 
>>>>>>>>>> Have you made sure that a slave in net B is able to telnet or ssh
>>>>>>> to the
>>>>>>>>>> leader machine in net A? Is it possible that the client port is
>>>>>>> blocker
>>>>>>>>>> from B to A?
>>>>>>>>>> 
>>>>>>>>>> -Flavio
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On 14 Apr 2016, at 14:09, Stefano Bianchi <ja...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi all
>>>>>>>>>>> i'm working on OpenStack and i have build come virtual machines
>>>>>>> and 2
>>>>>>>>>>> different networks with it.
>>>>>>>>>>> I have set two mesos clusters:
>>>>>>>>>>> 
>>>>>>>>>>> NetworkA:
>>>>>>>>>>> 2 mesos master
>>>>>>>>>>> 2 mesos slaves
>>>>>>>>>>> 
>>>>>>>>>>> NetworkB:
>>>>>>>>>>> 1 mesos master
>>>>>>>>>>> 1 mesos slave
>>>>>>>>>>> 
>>>>>>>>>>> I should try to make and interconnection between these two
>>>>>>> clusters.
>>>>>>>>>>> 
>>>>>>>>>>> I have set zookeeper configurations such that all 3 masters are
>>>>>>>>>> competing
>>>>>>>>>>> for he leadership. I show you the main configurations:
>>>>>>>>>>> 
>>>>>>>>>>> NetworkA on both 2 masters:
>>>>>>>>>>> 
>>>>>>>>>>> */etc/zookeeper/conf/zoo.cfg *: at the end of the file
>>>>>>>>>>> 
>>>>>>>>>>> server.1=192.168.100.54:2888:3888 (master1 on network A)
>>>>>>>>>>> 
>>>>>>>>>>> server.2=192.168.100.55:2888:3888 (master2 on network A)
>>>>>>>>>>> 
>>>>>>>>>>> server.3=131.154.xxx.xxx:2888:3888 (Master3 on network B, i have
>>>>>>> set
>>>>>>>>>>> floating IP)
>>>>>>>>>>> 
>>>>>>>>>>> *etc/mesos/zk*
>>>>>>>>>>> 
>>>>>>>>>>> zk://192.168.100.54:2181,192.168.100.55:2181
>>>>>>> ,131.154.xxx.xxx:2181/mesos
>>>>>>>>>>> 
>>>>>>>>>>> NetorkB:
>>>>>>>>>>> 
>>>>>>>>>>> */etc/zookeeper/conf/zoo.cfg: at the end of the file:*
>>>>>>>>>>> 
>>>>>>>>>>> server.1=131.154.96.27:2888:3888 (master1 on network A, i have
>>>>>>> set
>>>>>>>>>> floating
>>>>>>>>>>> IP)
>>>>>>>>>>> 
>>>>>>>>>>> server.2=131.154.96.32:2888:3888 (master2 on network A, i have
>>>>>>> set
>>>>>>>>>> floating
>>>>>>>>>>> IP)
>>>>>>>>>>> 
>>>>>>>>>>> server.3=192.168.10.11:2888:3888 (Master3 on network B)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> *etc/mesos/zk:*
>>>>>>>>>>> 
>>>>>>>>>>> zk://131.154.zzz.zzz:2181,131.154.yyy.yyy:2181,
>>>>>>> 192.168.10.11:2181/mesos
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> the 3 masters seems to work fine, if i stop mesos-master service
>>>>>>> on one
>>>>>>>>>> of
>>>>>>>>>>> them, there is the rielection, so they are behaving as one single
>>>>>>>>>> cluster
>>>>>>>>>>> with 3 masters.
>>>>>>>>>>> I have no problems with masters, but with slaves.
>>>>>>>>>>> I have currenty set up slaves setting the /etc/mesos/zk exactly
>>>>>>> as i
>>>>>>>>>> shown
>>>>>>>>>>> above in a coherent way.
>>>>>>>>>>> 
>>>>>>>>>>> Now the leader s one master which is on the Network A, and only
>>>>>>> the
>>>>>>>>>> slaves
>>>>>>>>>>> on Network A can connect to it, but i need to connect also the
>>>>>>> slave on
>>>>>>>>>> the
>>>>>>>>>>> other network.
>>>>>>>>>>> Do you have suggestions?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>

Re: Zookeeper mesos-master on different network

Posted by Stefano Bianchi <ja...@gmail.com>.

I finally found a solution.
On openstack i designed this topology:

-----------------------internet-----------------------
                                |
                           Router1
                                |
--------------------------------------------------------
|                                                                 |
Net1                                                        Net2
Master1 Master2                                     Master3
Slave1 slave2                                          Slave3

It is a simplified view, but in this way all the masters and agents are
reachable through unique hostname, so i am able to set zookeeper uniformly.
This topology works fine, meaning that if i have leader master on Net1 it
is able to dispatch a task on Master 3 on Net2.


2016-04-14 19:04 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:

> However quorum = 1 does not change anything. I guess that i beed to
> implement a DNS.
> Il 14/apr/2016 17:42, "Stefano Bianchi" <ja...@gmail.com> ha scritto:
>
>> i don't know why, but setting quorum to 1 on each master i don't have
>> fluctuating election continuously, i don't know if it could be the right
>> solution.
>> I tired to turn off one of the 2 masters on NetworkA, it goes down but
>> rielection start between the other master on network A and the master on
>> network B.
>> Now the only one problem i have is that, if one of the 2 masters on
>> network A are leading, only slaves on that network are atteched to it.
>> On the contrary, if the master of network B is leading only the slave on
>> that network is attached. How can i resolve this ?
>> I would like for instance that when Master on Network B is leading, all
>> the 3 slaves, so the one on the same network and 2 on the other network,
>> are "attached" to that master.
>> Do you have any suggestion?
>>
>> 2016-04-14 16:49 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>
>>> this is the log:
>>>
>>> Log file created at: 2016/04/14 14:48:26
>>> Running on machine: master3.novalocal
>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>>> I0414 14:48:26.415572 19956 logging.cpp:188] INFO level logging started!
>>> I0414 14:48:26.416097 19956 main.cpp:230] Build: 2016-03-10 20:32:58 by root
>>> I0414 14:48:26.416121 19956 main.cpp:232] Version: 0.27.2
>>> I0414 14:48:26.416133 19956 main.cpp:235] Git tag: 0.27.2
>>> I0414 14:48:26.416146 19956 main.cpp:239] Git SHA: 3c9ec4a0f34420b7803848af597de00fedefe0e2
>>> I0414 14:48:26.416205 19956 main.cpp:253] Using 'HierarchicalDRF' allocator
>>> I0414 14:48:26.448494 19956 leveldb.cpp:174] Opened db in 32.174282ms
>>> I0414 14:48:26.477005 19956 leveldb.cpp:181] Compacted db in 28.458808ms
>>> I0414 14:48:26.477056 19956 leveldb.cpp:196] Created db iterator in 9749ns
>>> I0414 14:48:26.477097 19956 leveldb.cpp:202] Seeked to beginning of db in 20828ns
>>> I0414 14:48:26.477110 19956 leveldb.cpp:271] Iterated through 0 keys in the db in 596ns
>>> I0414 14:48:26.477164 19956 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
>>> I0414 14:48:26.478237 19956 main.cpp:464] Starting Mesos master
>>> I0414 14:48:26.479887 19956 master.cpp:374] Master 51d6efb6-7611-4b4e-9118-ff7493889545 (131.154.96.156) started on 192.168.10.11:5050
>>> I0414 14:48:26.480388 19973 log.cpp:236] Attempting to join replica to ZooKeeper group
>>> I0414 14:48:26.482223 19977 recover.cpp:447] Starting replica recovery
>>> I0414 14:48:26.479909 19956 master.cpp:376] Flags at startup: --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="false" --authenticate_http="false" --authenticate_slaves="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname="131.154.96.156" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" --port="5050" --quiet="false" --quorum="2" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" --registry_strict="false" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/var/lib/mesos" --zk="zk://131.154.96.27:2181,131.154.96.32:2181,192.168.10.11:2181/mesos" --zk_session_timeout="10secs"
>>> I0414 14:48:26.483753 19956 master.cpp:423] Master allowing unauthenticated frameworks to register
>>> I0414 14:48:26.483772 19956 master.cpp:428] Master allowing unauthenticated slaves to register
>>> I0414 14:48:26.483789 19956 master.cpp:466] Using default 'crammd5' authenticator
>>> W0414 14:48:26.483810 19956 authenticator.cpp:511] No credentials provided, authentication requests will be refused
>>> I0414 14:48:26.484066 19956 authenticator.cpp:518] Initializing server SASL
>>> I0414 14:48:26.495026 19978 recover.cpp:473] Replica is in EMPTY status
>>> I0414 14:48:26.498484 19976 master.cpp:1649] Successfully attached file '/var/log/mesos/mesos-master.INFO'
>>> I0414 14:48:26.498517 19976 contender.cpp:147] Joining the ZK group
>>> I0414 14:48:26.527865 19972 group.cpp:349] Group process (group(1)@192.168.10.11:5050) connected to ZooKeeper
>>> I0414 14:48:26.527930 19972 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
>>> I0414 14:48:26.527954 19972 group.cpp:427] Trying to create path '/mesos/log_replicas' in ZooKeeper
>>> I0414 14:48:26.528306 19976 group.cpp:349] Group process (group(4)@192.168.10.11:5050) connected to ZooKeeper
>>> I0414 14:48:26.528364 19976 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
>>> I0414 14:48:26.528424 19976 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
>>> I0414 14:48:26.528740 19971 group.cpp:349] Group process (group(2)@192.168.10.11:5050) connected to ZooKeeper
>>> I0414 14:48:26.528771 19971 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
>>> I0414 14:48:26.528805 19971 group.cpp:427] Trying to create path '/mesos/log_replicas' in ZooKeeper
>>> I0414 14:48:26.534221 19972 network.hpp:413] ZooKeeper group memberships changed
>>> I0414 14:48:26.534343 19972 group.cpp:700] Trying to get '/mesos/log_replicas/0000000054' in ZooKeeper
>>> I0414 14:48:26.534713 19976 detector.cpp:154] Detected a new leader: (id='57')
>>> I0414 14:48:26.534843 19976 group.cpp:700] Trying to get '/mesos/json.info_0000000057' in ZooKeeper
>>> I0414 14:48:26.536515 19973 group.cpp:349] Group process (group(3)@192.168.10.11:5050) connected to ZooKeeper
>>> I0414 14:48:26.536546 19973 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
>>> I0414 14:48:26.536559 19973 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
>>> I0414 14:48:26.541244 19972 network.hpp:461] ZooKeeper group PIDs: { log-replica(1)@192.168.100.54:5050 }
>>> I0414 14:48:26.541806 19972 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (5)@192.168.10.11:5050
>>> I0414 14:48:26.541893 19972 recover.cpp:193] Received a recover response from a replica in EMPTY status
>>> I0414 14:48:26.542330 19976 detector.cpp:479] A new leading master (UPID=master@192.168.100.54:5050) is detected
>>> I0414 14:48:26.542408 19976 master.cpp:1710] The newly elected leader is master@192.168.100.54:5050 with id b6031dea-c621-4ba1-9254-87b7449e0d08
>>> I0414 14:48:26.555027 19976 network.hpp:413] ZooKeeper group memberships changed
>>> I0414 14:48:26.555173 19976 group.cpp:700] Trying to get '/mesos/log_replicas/0000000054' in ZooKeeper
>>> I0414 14:48:26.556934 19976 group.cpp:700] Trying to get '/mesos/log_replicas/0000000055' in ZooKeeper
>>> I0414 14:48:26.558343 19976 network.hpp:461] ZooKeeper group PIDs: { log-replica(1)@192.168.10.11:5050, log-replica(1)@192.168.100.54:5050 }
>>> I0414 14:48:26.562963 19971 contender.cpp:263] New candidate (id='58') has entered the contest for leadership
>>> I0414 14:48:36.496371 19979 recover.cpp:109] Unable to finish the recover protocol in 10secs, retrying
>>> I0414 14:48:36.496866 19971 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (10)@192.168.10.11:5050
>>> I0414 14:48:36.496919 19971 recover.cpp:193] Received a recover response from a replica in EMPTY status
>>> I0414 14:48:36.963434 19971 http.cpp:501] HTTP GET for /master/state.json from 131.154.5.22:59267 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36 OPR/36.0.2130.46'
>>> I0414 14:48:46.497448 19979 recover.cpp:109] Unable to finish the recover protocol in 10secs, retrying
>>> I0414 14:48:46.498134 19971 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (15)@192.168.10.11:5050
>>> I0414 14:48:46.498247 19971 recover.cpp:193] Received a recover response from a replica in EMPTY status
>>> I0414 14:48:56.498900 19979 recover.cpp:109] Unable to finish the recover protocol in 10secs, retrying
>>> I0414 14:48:56.499447 19971 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (17)@192.168.10.11:5050
>>> I0414 14:48:56.499526 19971 recover.cpp:193] Received a recover response from a replica in EMPTY status
>>>
>>>
>>> 2016-04-14 16:27 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>>
>>>> However now i perceive a problem with masters.
>>>> If i turn off one master on Network A the the master on network B is
>>>> elected but after a minute it will disconnect, coming back to the original
>>>> one.
>>>>
>>>> 2016-04-14 16:26 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>>>
>>>>> on openstack security group the ssh port is open.
>>>>>
>>>>>
>>>>> 2016-04-14 16:24 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
>>>>>
>>>>>> Is it an indication that the SSH port is open and the others aren't?
>>>>>>
>>>>>> -Flavio
>>>>>>
>>>>>> > On 14 Apr 2016, at 15:10, Stefano Bianchi <ja...@gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > I tried with telnet and i have connection timed out, but i am able
>>>>>> to
>>>>>> > connect trough SSH
>>>>>> >
>>>>>> > 2016-04-14 16:05 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>>>>> >
>>>>>> >> Thanks for your reply Flavio.
>>>>>> >> Actually, i don't have a DNS, so i am foced to type hosts file, in
>>>>>> which i
>>>>>> >> have set all the IP addrsses.
>>>>>> >> Of course for the note in Network B i have set the Floating IP of
>>>>>> the
>>>>>> >> other 2 slaves in network A associated to their hostname. Actually
>>>>>> i don't
>>>>>> >> know if it is correct, but at least if i make a ping from the
>>>>>> slave in
>>>>>> >> Network B to a slave in A i obtain replies. and vice versa.
>>>>>> >>
>>>>>> >> 2016-04-14 15:55 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
>>>>>> >>
>>>>>> >>> Have you made sure that a slave in net B is able to telnet or ssh
>>>>>> to the
>>>>>> >>> leader machine in net A? Is it possible that the client port is
>>>>>> blocker
>>>>>> >>> from B to A?
>>>>>> >>>
>>>>>> >>> -Flavio
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>> On 14 Apr 2016, at 14:09, Stefano Bianchi <ja...@gmail.com>
>>>>>> wrote:
>>>>>> >>>>
>>>>>> >>>> Hi all
>>>>>> >>>> i'm working on OpenStack and i have build come virtual machines
>>>>>> and 2
>>>>>> >>>> different networks with it.
>>>>>> >>>> I have set two mesos clusters:
>>>>>> >>>>
>>>>>> >>>> NetworkA:
>>>>>> >>>> 2 mesos master
>>>>>> >>>> 2 mesos slaves
>>>>>> >>>>
>>>>>> >>>> NetworkB:
>>>>>> >>>> 1 mesos master
>>>>>> >>>> 1 mesos slave
>>>>>> >>>>
>>>>>> >>>> I should try to make and interconnection between these two
>>>>>> clusters.
>>>>>> >>>>
>>>>>> >>>> I have set zookeeper configurations such that all 3 masters are
>>>>>> >>> competing
>>>>>> >>>> for he leadership. I show you the main configurations:
>>>>>> >>>>
>>>>>> >>>> NetworkA on both 2 masters:
>>>>>> >>>>
>>>>>> >>>> */etc/zookeeper/conf/zoo.cfg *: at the end of the file
>>>>>> >>>>
>>>>>> >>>> server.1=192.168.100.54:2888:3888 (master1 on network A)
>>>>>> >>>>
>>>>>> >>>> server.2=192.168.100.55:2888:3888 (master2 on network A)
>>>>>> >>>>
>>>>>> >>>> server.3=131.154.xxx.xxx:2888:3888 (Master3 on network B, i have
>>>>>> set
>>>>>> >>>> floating IP)
>>>>>> >>>>
>>>>>> >>>> *etc/mesos/zk*
>>>>>> >>>>
>>>>>> >>>> zk://192.168.100.54:2181,192.168.100.55:2181
>>>>>> ,131.154.xxx.xxx:2181/mesos
>>>>>> >>>>
>>>>>> >>>> NetorkB:
>>>>>> >>>>
>>>>>> >>>> */etc/zookeeper/conf/zoo.cfg: at the end of the file:*
>>>>>> >>>>
>>>>>> >>>> server.1=131.154.96.27:2888:3888 (master1 on network A, i have
>>>>>> set
>>>>>> >>> floating
>>>>>> >>>> IP)
>>>>>> >>>>
>>>>>> >>>> server.2=131.154.96.32:2888:3888 (master2 on network A, i have
>>>>>> set
>>>>>> >>> floating
>>>>>> >>>> IP)
>>>>>> >>>>
>>>>>> >>>> server.3=192.168.10.11:2888:3888 (Master3 on network B)
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> *etc/mesos/zk:*
>>>>>> >>>>
>>>>>> >>>> zk://131.154.zzz.zzz:2181,131.154.yyy.yyy:2181,
>>>>>> 192.168.10.11:2181/mesos
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> the 3 masters seems to work fine, if i stop mesos-master service
>>>>>> on one
>>>>>> >>> of
>>>>>> >>>> them, there is the rielection, so they are behaving as one single
>>>>>> >>> cluster
>>>>>> >>>> with 3 masters.
>>>>>> >>>> I have no problems with masters, but with slaves.
>>>>>> >>>> I have currenty set up slaves setting the /etc/mesos/zk exactly
>>>>>> as i
>>>>>> >>> shown
>>>>>> >>>> above in a coherent way.
>>>>>> >>>>
>>>>>> >>>> Now the leader s one master which is on the Network A, and only
>>>>>> the
>>>>>> >>> slaves
>>>>>> >>>> on Network A can connect to it, but i need to connect also the
>>>>>> slave on
>>>>>> >>> the
>>>>>> >>>> other network.
>>>>>> >>>> Do you have suggestions?
>>>>>> >>>
>>>>>> >>>
>>>>>> >>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Re: Zookeeper mesos-master on different network

Posted by Stefano Bianchi <ja...@gmail.com>.

However quorum = 1 does not change anything. I guess that i beed to
implement a DNS.
Il 14/apr/2016 17:42, "Stefano Bianchi" <ja...@gmail.com> ha scritto:

> i don't know why, but setting quorum to 1 on each master i don't have
> fluctuating election continuously, i don't know if it could be the right
> solution.
> I tired to turn off one of the 2 masters on NetworkA, it goes down but
> rielection start between the other master on network A and the master on
> network B.
> Now the only one problem i have is that, if one of the 2 masters on
> network A are leading, only slaves on that network are atteched to it.
> On the contrary, if the master of network B is leading only the slave on
> that network is attached. How can i resolve this ?
> I would like for instance that when Master on Network B is leading, all
> the 3 slaves, so the one on the same network and 2 on the other network,
> are "attached" to that master.
> Do you have any suggestion?
>
> 2016-04-14 16:49 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>
>> this is the log:
>>
>> Log file created at: 2016/04/14 14:48:26
>> Running on machine: master3.novalocal
>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>> I0414 14:48:26.415572 19956 logging.cpp:188] INFO level logging started!
>> I0414 14:48:26.416097 19956 main.cpp:230] Build: 2016-03-10 20:32:58 by root
>> I0414 14:48:26.416121 19956 main.cpp:232] Version: 0.27.2
>> I0414 14:48:26.416133 19956 main.cpp:235] Git tag: 0.27.2
>> I0414 14:48:26.416146 19956 main.cpp:239] Git SHA: 3c9ec4a0f34420b7803848af597de00fedefe0e2
>> I0414 14:48:26.416205 19956 main.cpp:253] Using 'HierarchicalDRF' allocator
>> I0414 14:48:26.448494 19956 leveldb.cpp:174] Opened db in 32.174282ms
>> I0414 14:48:26.477005 19956 leveldb.cpp:181] Compacted db in 28.458808ms
>> I0414 14:48:26.477056 19956 leveldb.cpp:196] Created db iterator in 9749ns
>> I0414 14:48:26.477097 19956 leveldb.cpp:202] Seeked to beginning of db in 20828ns
>> I0414 14:48:26.477110 19956 leveldb.cpp:271] Iterated through 0 keys in the db in 596ns
>> I0414 14:48:26.477164 19956 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
>> I0414 14:48:26.478237 19956 main.cpp:464] Starting Mesos master
>> I0414 14:48:26.479887 19956 master.cpp:374] Master 51d6efb6-7611-4b4e-9118-ff7493889545 (131.154.96.156) started on 192.168.10.11:5050
>> I0414 14:48:26.480388 19973 log.cpp:236] Attempting to join replica to ZooKeeper group
>> I0414 14:48:26.482223 19977 recover.cpp:447] Starting replica recovery
>> I0414 14:48:26.479909 19956 master.cpp:376] Flags at startup: --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="false" --authenticate_http="false" --authenticate_slaves="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname="131.154.96.156" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" --port="5050" --quiet="false" --quorum="2" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" --registry_strict="false" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/var/lib/mesos" --zk="zk://131.154.96.27:2181,131.154.96.32:2181,192.168.10.11:2181/mesos" --zk_session_timeout="10secs"
>> I0414 14:48:26.483753 19956 master.cpp:423] Master allowing unauthenticated frameworks to register
>> I0414 14:48:26.483772 19956 master.cpp:428] Master allowing unauthenticated slaves to register
>> I0414 14:48:26.483789 19956 master.cpp:466] Using default 'crammd5' authenticator
>> W0414 14:48:26.483810 19956 authenticator.cpp:511] No credentials provided, authentication requests will be refused
>> I0414 14:48:26.484066 19956 authenticator.cpp:518] Initializing server SASL
>> I0414 14:48:26.495026 19978 recover.cpp:473] Replica is in EMPTY status
>> I0414 14:48:26.498484 19976 master.cpp:1649] Successfully attached file '/var/log/mesos/mesos-master.INFO'
>> I0414 14:48:26.498517 19976 contender.cpp:147] Joining the ZK group
>> I0414 14:48:26.527865 19972 group.cpp:349] Group process (group(1)@192.168.10.11:5050) connected to ZooKeeper
>> I0414 14:48:26.527930 19972 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
>> I0414 14:48:26.527954 19972 group.cpp:427] Trying to create path '/mesos/log_replicas' in ZooKeeper
>> I0414 14:48:26.528306 19976 group.cpp:349] Group process (group(4)@192.168.10.11:5050) connected to ZooKeeper
>> I0414 14:48:26.528364 19976 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
>> I0414 14:48:26.528424 19976 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
>> I0414 14:48:26.528740 19971 group.cpp:349] Group process (group(2)@192.168.10.11:5050) connected to ZooKeeper
>> I0414 14:48:26.528771 19971 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
>> I0414 14:48:26.528805 19971 group.cpp:427] Trying to create path '/mesos/log_replicas' in ZooKeeper
>> I0414 14:48:26.534221 19972 network.hpp:413] ZooKeeper group memberships changed
>> I0414 14:48:26.534343 19972 group.cpp:700] Trying to get '/mesos/log_replicas/0000000054' in ZooKeeper
>> I0414 14:48:26.534713 19976 detector.cpp:154] Detected a new leader: (id='57')
>> I0414 14:48:26.534843 19976 group.cpp:700] Trying to get '/mesos/json.info_0000000057' in ZooKeeper
>> I0414 14:48:26.536515 19973 group.cpp:349] Group process (group(3)@192.168.10.11:5050) connected to ZooKeeper
>> I0414 14:48:26.536546 19973 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
>> I0414 14:48:26.536559 19973 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
>> I0414 14:48:26.541244 19972 network.hpp:461] ZooKeeper group PIDs: { log-replica(1)@192.168.100.54:5050 }
>> I0414 14:48:26.541806 19972 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (5)@192.168.10.11:5050
>> I0414 14:48:26.541893 19972 recover.cpp:193] Received a recover response from a replica in EMPTY status
>> I0414 14:48:26.542330 19976 detector.cpp:479] A new leading master (UPID=master@192.168.100.54:5050) is detected
>> I0414 14:48:26.542408 19976 master.cpp:1710] The newly elected leader is master@192.168.100.54:5050 with id b6031dea-c621-4ba1-9254-87b7449e0d08
>> I0414 14:48:26.555027 19976 network.hpp:413] ZooKeeper group memberships changed
>> I0414 14:48:26.555173 19976 group.cpp:700] Trying to get '/mesos/log_replicas/0000000054' in ZooKeeper
>> I0414 14:48:26.556934 19976 group.cpp:700] Trying to get '/mesos/log_replicas/0000000055' in ZooKeeper
>> I0414 14:48:26.558343 19976 network.hpp:461] ZooKeeper group PIDs: { log-replica(1)@192.168.10.11:5050, log-replica(1)@192.168.100.54:5050 }
>> I0414 14:48:26.562963 19971 contender.cpp:263] New candidate (id='58') has entered the contest for leadership
>> I0414 14:48:36.496371 19979 recover.cpp:109] Unable to finish the recover protocol in 10secs, retrying
>> I0414 14:48:36.496866 19971 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (10)@192.168.10.11:5050
>> I0414 14:48:36.496919 19971 recover.cpp:193] Received a recover response from a replica in EMPTY status
>> I0414 14:48:36.963434 19971 http.cpp:501] HTTP GET for /master/state.json from 131.154.5.22:59267 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36 OPR/36.0.2130.46'
>> I0414 14:48:46.497448 19979 recover.cpp:109] Unable to finish the recover protocol in 10secs, retrying
>> I0414 14:48:46.498134 19971 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (15)@192.168.10.11:5050
>> I0414 14:48:46.498247 19971 recover.cpp:193] Received a recover response from a replica in EMPTY status
>> I0414 14:48:56.498900 19979 recover.cpp:109] Unable to finish the recover protocol in 10secs, retrying
>> I0414 14:48:56.499447 19971 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (17)@192.168.10.11:5050
>> I0414 14:48:56.499526 19971 recover.cpp:193] Received a recover response from a replica in EMPTY status
>>
>>
>> 2016-04-14 16:27 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>
>>> However now i perceive a problem with masters.
>>> If i turn off one master on Network A the the master on network B is
>>> elected but after a minute it will disconnect, coming back to the original
>>> one.
>>>
>>> 2016-04-14 16:26 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>>
>>>> on openstack security group the ssh port is open.
>>>>
>>>>
>>>> 2016-04-14 16:24 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
>>>>
>>>>> Is it an indication that the SSH port is open and the others aren't?
>>>>>
>>>>> -Flavio
>>>>>
>>>>> > On 14 Apr 2016, at 15:10, Stefano Bianchi <ja...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > I tried with telnet and i have connection timed out, but i am able to
>>>>> > connect trough SSH
>>>>> >
>>>>> > 2016-04-14 16:05 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>>>> >
>>>>> >> Thanks for your reply Flavio.
>>>>> >> Actually, i don't have a DNS, so i am foced to type hosts file, in
>>>>> which i
>>>>> >> have set all the IP addrsses.
>>>>> >> Of course for the note in Network B i have set the Floating IP of
>>>>> the
>>>>> >> other 2 slaves in network A associated to their hostname. Actually
>>>>> i don't
>>>>> >> know if it is correct, but at least if i make a ping from the slave
>>>>> in
>>>>> >> Network B to a slave in A i obtain replies. and vice versa.
>>>>> >>
>>>>> >> 2016-04-14 15:55 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
>>>>> >>
>>>>> >>> Have you made sure that a slave in net B is able to telnet or ssh
>>>>> to the
>>>>> >>> leader machine in net A? Is it possible that the client port is
>>>>> blocker
>>>>> >>> from B to A?
>>>>> >>>
>>>>> >>> -Flavio
>>>>> >>>
>>>>> >>>
>>>>> >>>> On 14 Apr 2016, at 14:09, Stefano Bianchi <ja...@gmail.com>
>>>>> wrote:
>>>>> >>>>
>>>>> >>>> Hi all
>>>>> >>>> i'm working on OpenStack and i have build come virtual machines
>>>>> and 2
>>>>> >>>> different networks with it.
>>>>> >>>> I have set two mesos clusters:
>>>>> >>>>
>>>>> >>>> NetworkA:
>>>>> >>>> 2 mesos master
>>>>> >>>> 2 mesos slaves
>>>>> >>>>
>>>>> >>>> NetworkB:
>>>>> >>>> 1 mesos master
>>>>> >>>> 1 mesos slave
>>>>> >>>>
>>>>> >>>> I should try to make and interconnection between these two
>>>>> clusters.
>>>>> >>>>
>>>>> >>>> I have set zookeeper configurations such that all 3 masters are
>>>>> >>> competing
>>>>> >>>> for he leadership. I show you the main configurations:
>>>>> >>>>
>>>>> >>>> NetworkA on both 2 masters:
>>>>> >>>>
>>>>> >>>> */etc/zookeeper/conf/zoo.cfg *: at the end of the file
>>>>> >>>>
>>>>> >>>> server.1=192.168.100.54:2888:3888 (master1 on network A)
>>>>> >>>>
>>>>> >>>> server.2=192.168.100.55:2888:3888 (master2 on network A)
>>>>> >>>>
>>>>> >>>> server.3=131.154.xxx.xxx:2888:3888 (Master3 on network B, i have
>>>>> set
>>>>> >>>> floating IP)
>>>>> >>>>
>>>>> >>>> *etc/mesos/zk*
>>>>> >>>>
>>>>> >>>> zk://192.168.100.54:2181,192.168.100.55:2181
>>>>> ,131.154.xxx.xxx:2181/mesos
>>>>> >>>>
>>>>> >>>> NetorkB:
>>>>> >>>>
>>>>> >>>> */etc/zookeeper/conf/zoo.cfg: at the end of the file:*
>>>>> >>>>
>>>>> >>>> server.1=131.154.96.27:2888:3888 (master1 on network A, i have
>>>>> set
>>>>> >>> floating
>>>>> >>>> IP)
>>>>> >>>>
>>>>> >>>> server.2=131.154.96.32:2888:3888 (master2 on network A, i have
>>>>> set
>>>>> >>> floating
>>>>> >>>> IP)
>>>>> >>>>
>>>>> >>>> server.3=192.168.10.11:2888:3888 (Master3 on network B)
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> *etc/mesos/zk:*
>>>>> >>>>
>>>>> >>>> zk://131.154.zzz.zzz:2181,131.154.yyy.yyy:2181,
>>>>> 192.168.10.11:2181/mesos
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> the 3 masters seems to work fine, if i stop mesos-master service
>>>>> on one
>>>>> >>> of
>>>>> >>>> them, there is the rielection, so they are behaving as one single
>>>>> >>> cluster
>>>>> >>>> with 3 masters.
>>>>> >>>> I have no problems with masters, but with slaves.
>>>>> >>>> I have currenty set up slaves setting the /etc/mesos/zk exactly
>>>>> as i
>>>>> >>> shown
>>>>> >>>> above in a coherent way.
>>>>> >>>>
>>>>> >>>> Now the leader s one master which is on the Network A, and only
>>>>> the
>>>>> >>> slaves
>>>>> >>>> on Network A can connect to it, but i need to connect also the
>>>>> slave on
>>>>> >>> the
>>>>> >>>> other network.
>>>>> >>>> Do you have suggestions?
>>>>> >>>
>>>>> >>>
>>>>> >>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Zookeeper mesos-master on different network

Posted by Stefano Bianchi <ja...@gmail.com>.

i don't know why, but setting quorum to 1 on each master i don't have
fluctuating election continuously, i don't know if it could be the right
solution.
I tired to turn off one of the 2 masters on NetworkA, it goes down but
rielection start between the other master on network A and the master on
network B.
Now the only one problem i have is that, if one of the 2 masters on network
A are leading, only slaves on that network are atteched to it.
On the contrary, if the master of network B is leading only the slave on
that network is attached. How can i resolve this ?
I would like for instance that when Master on Network B is leading, all the
3 slaves, so the one on the same network and 2 on the other network, are
"attached" to that master.
Do you have any suggestion?

2016-04-14 16:49 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:

> this is the log:
>
> Log file created at: 2016/04/14 14:48:26
> Running on machine: master3.novalocal
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> I0414 14:48:26.415572 19956 logging.cpp:188] INFO level logging started!
> I0414 14:48:26.416097 19956 main.cpp:230] Build: 2016-03-10 20:32:58 by root
> I0414 14:48:26.416121 19956 main.cpp:232] Version: 0.27.2
> I0414 14:48:26.416133 19956 main.cpp:235] Git tag: 0.27.2
> I0414 14:48:26.416146 19956 main.cpp:239] Git SHA: 3c9ec4a0f34420b7803848af597de00fedefe0e2
> I0414 14:48:26.416205 19956 main.cpp:253] Using 'HierarchicalDRF' allocator
> I0414 14:48:26.448494 19956 leveldb.cpp:174] Opened db in 32.174282ms
> I0414 14:48:26.477005 19956 leveldb.cpp:181] Compacted db in 28.458808ms
> I0414 14:48:26.477056 19956 leveldb.cpp:196] Created db iterator in 9749ns
> I0414 14:48:26.477097 19956 leveldb.cpp:202] Seeked to beginning of db in 20828ns
> I0414 14:48:26.477110 19956 leveldb.cpp:271] Iterated through 0 keys in the db in 596ns
> I0414 14:48:26.477164 19956 replica.cpp:779] Replica recovered with log positions 0 -> 0 with 1 holes and 0 unlearned
> I0414 14:48:26.478237 19956 main.cpp:464] Starting Mesos master
> I0414 14:48:26.479887 19956 master.cpp:374] Master 51d6efb6-7611-4b4e-9118-ff7493889545 (131.154.96.156) started on 192.168.10.11:5050
> I0414 14:48:26.480388 19973 log.cpp:236] Attempting to join replica to ZooKeeper group
> I0414 14:48:26.482223 19977 recover.cpp:447] Starting replica recovery
> I0414 14:48:26.479909 19956 master.cpp:376] Flags at startup: --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate="false" --authenticate_http="false" --authenticate_slaves="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname="131.154.96.156" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" --port="5050" --quiet="false" --quorum="2" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" --registry_strict="false" --root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/var/lib/mesos" --zk="zk://131.154.96.27:2181,131.154.96.32:2181,192.168.10.11:2181/mesos" --zk_session_timeout="10secs"
> I0414 14:48:26.483753 19956 master.cpp:423] Master allowing unauthenticated frameworks to register
> I0414 14:48:26.483772 19956 master.cpp:428] Master allowing unauthenticated slaves to register
> I0414 14:48:26.483789 19956 master.cpp:466] Using default 'crammd5' authenticator
> W0414 14:48:26.483810 19956 authenticator.cpp:511] No credentials provided, authentication requests will be refused
> I0414 14:48:26.484066 19956 authenticator.cpp:518] Initializing server SASL
> I0414 14:48:26.495026 19978 recover.cpp:473] Replica is in EMPTY status
> I0414 14:48:26.498484 19976 master.cpp:1649] Successfully attached file '/var/log/mesos/mesos-master.INFO'
> I0414 14:48:26.498517 19976 contender.cpp:147] Joining the ZK group
> I0414 14:48:26.527865 19972 group.cpp:349] Group process (group(1)@192.168.10.11:5050) connected to ZooKeeper
> I0414 14:48:26.527930 19972 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
> I0414 14:48:26.527954 19972 group.cpp:427] Trying to create path '/mesos/log_replicas' in ZooKeeper
> I0414 14:48:26.528306 19976 group.cpp:349] Group process (group(4)@192.168.10.11:5050) connected to ZooKeeper
> I0414 14:48:26.528364 19976 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
> I0414 14:48:26.528424 19976 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
> I0414 14:48:26.528740 19971 group.cpp:349] Group process (group(2)@192.168.10.11:5050) connected to ZooKeeper
> I0414 14:48:26.528771 19971 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
> I0414 14:48:26.528805 19971 group.cpp:427] Trying to create path '/mesos/log_replicas' in ZooKeeper
> I0414 14:48:26.534221 19972 network.hpp:413] ZooKeeper group memberships changed
> I0414 14:48:26.534343 19972 group.cpp:700] Trying to get '/mesos/log_replicas/0000000054' in ZooKeeper
> I0414 14:48:26.534713 19976 detector.cpp:154] Detected a new leader: (id='57')
> I0414 14:48:26.534843 19976 group.cpp:700] Trying to get '/mesos/json.info_0000000057' in ZooKeeper
> I0414 14:48:26.536515 19973 group.cpp:349] Group process (group(3)@192.168.10.11:5050) connected to ZooKeeper
> I0414 14:48:26.536546 19973 group.cpp:831] Syncing group operations: queue size (joins, cancels, datas) = (1, 0, 0)
> I0414 14:48:26.536559 19973 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
> I0414 14:48:26.541244 19972 network.hpp:461] ZooKeeper group PIDs: { log-replica(1)@192.168.100.54:5050 }
> I0414 14:48:26.541806 19972 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (5)@192.168.10.11:5050
> I0414 14:48:26.541893 19972 recover.cpp:193] Received a recover response from a replica in EMPTY status
> I0414 14:48:26.542330 19976 detector.cpp:479] A new leading master (UPID=master@192.168.100.54:5050) is detected
> I0414 14:48:26.542408 19976 master.cpp:1710] The newly elected leader is master@192.168.100.54:5050 with id b6031dea-c621-4ba1-9254-87b7449e0d08
> I0414 14:48:26.555027 19976 network.hpp:413] ZooKeeper group memberships changed
> I0414 14:48:26.555173 19976 group.cpp:700] Trying to get '/mesos/log_replicas/0000000054' in ZooKeeper
> I0414 14:48:26.556934 19976 group.cpp:700] Trying to get '/mesos/log_replicas/0000000055' in ZooKeeper
> I0414 14:48:26.558343 19976 network.hpp:461] ZooKeeper group PIDs: { log-replica(1)@192.168.10.11:5050, log-replica(1)@192.168.100.54:5050 }
> I0414 14:48:26.562963 19971 contender.cpp:263] New candidate (id='58') has entered the contest for leadership
> I0414 14:48:36.496371 19979 recover.cpp:109] Unable to finish the recover protocol in 10secs, retrying
> I0414 14:48:36.496866 19971 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (10)@192.168.10.11:5050
> I0414 14:48:36.496919 19971 recover.cpp:193] Received a recover response from a replica in EMPTY status
> I0414 14:48:36.963434 19971 http.cpp:501] HTTP GET for /master/state.json from 131.154.5.22:59267 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36 OPR/36.0.2130.46'
> I0414 14:48:46.497448 19979 recover.cpp:109] Unable to finish the recover protocol in 10secs, retrying
> I0414 14:48:46.498134 19971 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (15)@192.168.10.11:5050
> I0414 14:48:46.498247 19971 recover.cpp:193] Received a recover response from a replica in EMPTY status
> I0414 14:48:56.498900 19979 recover.cpp:109] Unable to finish the recover protocol in 10secs, retrying
> I0414 14:48:56.499447 19971 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (17)@192.168.10.11:5050
> I0414 14:48:56.499526 19971 recover.cpp:193] Received a recover response from a replica in EMPTY status
>
>
> 2016-04-14 16:27 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>
>> However now i perceive a problem with masters.
>> If i turn off one master on Network A the the master on network B is
>> elected but after a minute it will disconnect, coming back to the original
>> one.
>>
>> 2016-04-14 16:26 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>
>>> on openstack security group the ssh port is open.
>>>
>>>
>>> 2016-04-14 16:24 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
>>>
>>>> Is it an indication that the SSH port is open and the others aren't?
>>>>
>>>> -Flavio
>>>>
>>>> > On 14 Apr 2016, at 15:10, Stefano Bianchi <ja...@gmail.com>
>>>> wrote:
>>>> >
>>>> > I tried with telnet and i have connection timed out, but i am able to
>>>> > connect trough SSH
>>>> >
>>>> > 2016-04-14 16:05 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>>> >
>>>> >> Thanks for your reply Flavio.
>>>> >> Actually, i don't have a DNS, so i am foced to type hosts file, in
>>>> which i
>>>> >> have set all the IP addrsses.
>>>> >> Of course for the note in Network B i have set the Floating IP of the
>>>> >> other 2 slaves in network A associated to their hostname. Actually i
>>>> don't
>>>> >> know if it is correct, but at least if i make a ping from the slave
>>>> in
>>>> >> Network B to a slave in A i obtain replies. and vice versa.
>>>> >>
>>>> >> 2016-04-14 15:55 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
>>>> >>
>>>> >>> Have you made sure that a slave in net B is able to telnet or ssh
>>>> to the
>>>> >>> leader machine in net A? Is it possible that the client port is
>>>> blocker
>>>> >>> from B to A?
>>>> >>>
>>>> >>> -Flavio
>>>> >>>
>>>> >>>
>>>> >>>> On 14 Apr 2016, at 14:09, Stefano Bianchi <ja...@gmail.com>
>>>> wrote:
>>>> >>>>
>>>> >>>> Hi all
>>>> >>>> i'm working on OpenStack and i have build come virtual machines
>>>> and 2
>>>> >>>> different networks with it.
>>>> >>>> I have set two mesos clusters:
>>>> >>>>
>>>> >>>> NetworkA:
>>>> >>>> 2 mesos master
>>>> >>>> 2 mesos slaves
>>>> >>>>
>>>> >>>> NetworkB:
>>>> >>>> 1 mesos master
>>>> >>>> 1 mesos slave
>>>> >>>>
>>>> >>>> I should try to make and interconnection between these two
>>>> clusters.
>>>> >>>>
>>>> >>>> I have set zookeeper configurations such that all 3 masters are
>>>> >>> competing
>>>> >>>> for he leadership. I show you the main configurations:
>>>> >>>>
>>>> >>>> NetworkA on both 2 masters:
>>>> >>>>
>>>> >>>> */etc/zookeeper/conf/zoo.cfg *: at the end of the file
>>>> >>>>
>>>> >>>> server.1=192.168.100.54:2888:3888 (master1 on network A)
>>>> >>>>
>>>> >>>> server.2=192.168.100.55:2888:3888 (master2 on network A)
>>>> >>>>
>>>> >>>> server.3=131.154.xxx.xxx:2888:3888 (Master3 on network B, i have
>>>> set
>>>> >>>> floating IP)
>>>> >>>>
>>>> >>>> *etc/mesos/zk*
>>>> >>>>
>>>> >>>> zk://192.168.100.54:2181,192.168.100.55:2181
>>>> ,131.154.xxx.xxx:2181/mesos
>>>> >>>>
>>>> >>>> NetorkB:
>>>> >>>>
>>>> >>>> */etc/zookeeper/conf/zoo.cfg: at the end of the file:*
>>>> >>>>
>>>> >>>> server.1=131.154.96.27:2888:3888 (master1 on network A, i have set
>>>> >>> floating
>>>> >>>> IP)
>>>> >>>>
>>>> >>>> server.2=131.154.96.32:2888:3888 (master2 on network A, i have set
>>>> >>> floating
>>>> >>>> IP)
>>>> >>>>
>>>> >>>> server.3=192.168.10.11:2888:3888 (Master3 on network B)
>>>> >>>>
>>>> >>>>
>>>> >>>> *etc/mesos/zk:*
>>>> >>>>
>>>> >>>> zk://131.154.zzz.zzz:2181,131.154.yyy.yyy:2181,
>>>> 192.168.10.11:2181/mesos
>>>> >>>>
>>>> >>>>
>>>> >>>> the 3 masters seems to work fine, if i stop mesos-master service
>>>> on one
>>>> >>> of
>>>> >>>> them, there is the rielection, so they are behaving as one single
>>>> >>> cluster
>>>> >>>> with 3 masters.
>>>> >>>> I have no problems with masters, but with slaves.
>>>> >>>> I have currenty set up slaves setting the /etc/mesos/zk exactly as
>>>> i
>>>> >>> shown
>>>> >>>> above in a coherent way.
>>>> >>>>
>>>> >>>> Now the leader s one master which is on the Network A, and only the
>>>> >>> slaves
>>>> >>>> on Network A can connect to it, but i need to connect also the
>>>> slave on
>>>> >>> the
>>>> >>>> other network.
>>>> >>>> Do you have suggestions?
>>>> >>>
>>>> >>>
>>>> >>
>>>>
>>>>
>>>
>>
>

Re: Zookeeper mesos-master on different network

Posted by Stefano Bianchi <ja...@gmail.com>.

this is the log:

Log file created at: 2016/04/14 14:48:26
Running on machine: master3.novalocal
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0414 14:48:26.415572 19956 logging.cpp:188] INFO level logging started!
I0414 14:48:26.416097 19956 main.cpp:230] Build: 2016-03-10 20:32:58 by root
I0414 14:48:26.416121 19956 main.cpp:232] Version: 0.27.2
I0414 14:48:26.416133 19956 main.cpp:235] Git tag: 0.27.2
I0414 14:48:26.416146 19956 main.cpp:239] Git SHA:
3c9ec4a0f34420b7803848af597de00fedefe0e2
I0414 14:48:26.416205 19956 main.cpp:253] Using 'HierarchicalDRF' allocator
I0414 14:48:26.448494 19956 leveldb.cpp:174] Opened db in 32.174282ms
I0414 14:48:26.477005 19956 leveldb.cpp:181] Compacted db in 28.458808ms
I0414 14:48:26.477056 19956 leveldb.cpp:196] Created db iterator in 9749ns
I0414 14:48:26.477097 19956 leveldb.cpp:202] Seeked to beginning of db
in 20828ns
I0414 14:48:26.477110 19956 leveldb.cpp:271] Iterated through 0 keys
in the db in 596ns
I0414 14:48:26.477164 19956 replica.cpp:779] Replica recovered with
log positions 0 -> 0 with 1 holes and 0 unlearned
I0414 14:48:26.478237 19956 main.cpp:464] Starting Mesos master
I0414 14:48:26.479887 19956 master.cpp:374] Master
51d6efb6-7611-4b4e-9118-ff7493889545 (131.154.96.156) started on
192.168.10.11:5050
I0414 14:48:26.480388 19973 log.cpp:236] Attempting to join replica to
ZooKeeper group
I0414 14:48:26.482223 19977 recover.cpp:447] Starting replica recovery
I0414 14:48:26.479909 19956 master.cpp:376] Flags at startup:
--allocation_interval="1secs" --allocator="HierarchicalDRF"
--authenticate="false" --authenticate_http="false"
--authenticate_slaves="false" --authenticators="crammd5"
--authorizers="local" --framework_sorter="drf" --help="false"
--hostname="131.154.96.156" --hostname_lookup="true"
--http_authenticators="basic" --initialize_driver_logging="true"
--log_auto_initialize="true" --log_dir="/var/log/mesos"
--logbufsecs="0" --logging_level="INFO"
--max_completed_frameworks="50"
--max_completed_tasks_per_framework="1000"
--max_slave_ping_timeouts="5" --port="5050" --quiet="false"
--quorum="2" --recovery_slave_removal_limit="100%"
--registry="replicated_log" --registry_fetch_timeout="1mins"
--registry_store_timeout="5secs" --registry_strict="false"
--root_submissions="true" --slave_ping_timeout="15secs"
--slave_reregister_timeout="10mins" --user_sorter="drf"
--version="false" --webui_dir="/usr/share/mesos/webui"
--work_dir="/var/lib/mesos"
--zk="zk://131.154.96.27:2181,131.154.96.32:2181,192.168.10.11:2181/mesos"
--zk_session_timeout="10secs"
I0414 14:48:26.483753 19956 master.cpp:423] Master allowing
unauthenticated frameworks to register
I0414 14:48:26.483772 19956 master.cpp:428] Master allowing
unauthenticated slaves to register
I0414 14:48:26.483789 19956 master.cpp:466] Using default 'crammd5'
authenticator
W0414 14:48:26.483810 19956 authenticator.cpp:511] No credentials
provided, authentication requests will be refused
I0414 14:48:26.484066 19956 authenticator.cpp:518] Initializing server SASL
I0414 14:48:26.495026 19978 recover.cpp:473] Replica is in EMPTY status
I0414 14:48:26.498484 19976 master.cpp:1649] Successfully attached
file '/var/log/mesos/mesos-master.INFO'
I0414 14:48:26.498517 19976 contender.cpp:147] Joining the ZK group
I0414 14:48:26.527865 19972 group.cpp:349] Group process
(group(1)@192.168.10.11:5050) connected to ZooKeeper
I0414 14:48:26.527930 19972 group.cpp:831] Syncing group operations:
queue size (joins, cancels, datas) = (0, 0, 0)
I0414 14:48:26.527954 19972 group.cpp:427] Trying to create path
'/mesos/log_replicas' in ZooKeeper
I0414 14:48:26.528306 19976 group.cpp:349] Group process
(group(4)@192.168.10.11:5050) connected to ZooKeeper
I0414 14:48:26.528364 19976 group.cpp:831] Syncing group operations:
queue size (joins, cancels, datas) = (0, 0, 0)
I0414 14:48:26.528424 19976 group.cpp:427] Trying to create path
'/mesos' in ZooKeeper
I0414 14:48:26.528740 19971 group.cpp:349] Group process
(group(2)@192.168.10.11:5050) connected to ZooKeeper
I0414 14:48:26.528771 19971 group.cpp:831] Syncing group operations:
queue size (joins, cancels, datas) = (1, 0, 0)
I0414 14:48:26.528805 19971 group.cpp:427] Trying to create path
'/mesos/log_replicas' in ZooKeeper
I0414 14:48:26.534221 19972 network.hpp:413] ZooKeeper group memberships changed
I0414 14:48:26.534343 19972 group.cpp:700] Trying to get
'/mesos/log_replicas/0000000054' in ZooKeeper
I0414 14:48:26.534713 19976 detector.cpp:154] Detected a new leader: (id='57')
I0414 14:48:26.534843 19976 group.cpp:700] Trying to get
'/mesos/json.info_0000000057' in ZooKeeper
I0414 14:48:26.536515 19973 group.cpp:349] Group process
(group(3)@192.168.10.11:5050) connected to ZooKeeper
I0414 14:48:26.536546 19973 group.cpp:831] Syncing group operations:
queue size (joins, cancels, datas) = (1, 0, 0)
I0414 14:48:26.536559 19973 group.cpp:427] Trying to create path
'/mesos' in ZooKeeper
I0414 14:48:26.541244 19972 network.hpp:461] ZooKeeper group PIDs: {
log-replica(1)@192.168.100.54:5050 }
I0414 14:48:26.541806 19972 replica.cpp:673] Replica in EMPTY status
received a broadcasted recover request from (5)@192.168.10.11:5050
I0414 14:48:26.541893 19972 recover.cpp:193] Received a recover
response from a replica in EMPTY status
I0414 14:48:26.542330 19976 detector.cpp:479] A new leading master
(UPID=master@192.168.100.54:5050) is detected
I0414 14:48:26.542408 19976 master.cpp:1710] The newly elected leader
is master@192.168.100.54:5050 with id
b6031dea-c621-4ba1-9254-87b7449e0d08
I0414 14:48:26.555027 19976 network.hpp:413] ZooKeeper group memberships changed
I0414 14:48:26.555173 19976 group.cpp:700] Trying to get
'/mesos/log_replicas/0000000054' in ZooKeeper
I0414 14:48:26.556934 19976 group.cpp:700] Trying to get
'/mesos/log_replicas/0000000055' in ZooKeeper
I0414 14:48:26.558343 19976 network.hpp:461] ZooKeeper group PIDs: {
log-replica(1)@192.168.10.11:5050, log-replica(1)@192.168.100.54:5050
}
I0414 14:48:26.562963 19971 contender.cpp:263] New candidate (id='58')
has entered the contest for leadership
I0414 14:48:36.496371 19979 recover.cpp:109] Unable to finish the
recover protocol in 10secs, retrying
I0414 14:48:36.496866 19971 replica.cpp:673] Replica in EMPTY status
received a broadcasted recover request from (10)@192.168.10.11:5050
I0414 14:48:36.496919 19971 recover.cpp:193] Received a recover
response from a replica in EMPTY status
I0414 14:48:36.963434 19971 http.cpp:501] HTTP GET for
/master/state.json from 131.154.5.22:59267 with
User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87
Safari/537.36 OPR/36.0.2130.46'
I0414 14:48:46.497448 19979 recover.cpp:109] Unable to finish the
recover protocol in 10secs, retrying
I0414 14:48:46.498134 19971 replica.cpp:673] Replica in EMPTY status
received a broadcasted recover request from (15)@192.168.10.11:5050
I0414 14:48:46.498247 19971 recover.cpp:193] Received a recover
response from a replica in EMPTY status
I0414 14:48:56.498900 19979 recover.cpp:109] Unable to finish the
recover protocol in 10secs, retrying
I0414 14:48:56.499447 19971 replica.cpp:673] Replica in EMPTY status
received a broadcasted recover request from (17)@192.168.10.11:5050
I0414 14:48:56.499526 19971 recover.cpp:193] Received a recover
response from a replica in EMPTY status


2016-04-14 16:27 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:

> However now i perceive a problem with masters.
> If i turn off one master on Network A the the master on network B is
> elected but after a minute it will disconnect, coming back to the original
> one.
>
> 2016-04-14 16:26 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>
>> on openstack security group the ssh port is open.
>>
>>
>> 2016-04-14 16:24 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
>>
>>> Is it an indication that the SSH port is open and the others aren't?
>>>
>>> -Flavio
>>>
>>> > On 14 Apr 2016, at 15:10, Stefano Bianchi <ja...@gmail.com>
>>> wrote:
>>> >
>>> > I tried with telnet and i have connection timed out, but i am able to
>>> > connect trough SSH
>>> >
>>> > 2016-04-14 16:05 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>> >
>>> >> Thanks for your reply Flavio.
>>> >> Actually, i don't have a DNS, so i am foced to type hosts file, in
>>> which i
>>> >> have set all the IP addrsses.
>>> >> Of course for the note in Network B i have set the Floating IP of the
>>> >> other 2 slaves in network A associated to their hostname. Actually i
>>> don't
>>> >> know if it is correct, but at least if i make a ping from the slave in
>>> >> Network B to a slave in A i obtain replies. and vice versa.
>>> >>
>>> >> 2016-04-14 15:55 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
>>> >>
>>> >>> Have you made sure that a slave in net B is able to telnet or ssh to
>>> the
>>> >>> leader machine in net A? Is it possible that the client port is
>>> blocker
>>> >>> from B to A?
>>> >>>
>>> >>> -Flavio
>>> >>>
>>> >>>
>>> >>>> On 14 Apr 2016, at 14:09, Stefano Bianchi <ja...@gmail.com>
>>> wrote:
>>> >>>>
>>> >>>> Hi all
>>> >>>> i'm working on OpenStack and i have build come virtual machines and
>>> 2
>>> >>>> different networks with it.
>>> >>>> I have set two mesos clusters:
>>> >>>>
>>> >>>> NetworkA:
>>> >>>> 2 mesos master
>>> >>>> 2 mesos slaves
>>> >>>>
>>> >>>> NetworkB:
>>> >>>> 1 mesos master
>>> >>>> 1 mesos slave
>>> >>>>
>>> >>>> I should try to make and interconnection between these two clusters.
>>> >>>>
>>> >>>> I have set zookeeper configurations such that all 3 masters are
>>> >>> competing
>>> >>>> for he leadership. I show you the main configurations:
>>> >>>>
>>> >>>> NetworkA on both 2 masters:
>>> >>>>
>>> >>>> */etc/zookeeper/conf/zoo.cfg *: at the end of the file
>>> >>>>
>>> >>>> server.1=192.168.100.54:2888:3888 (master1 on network A)
>>> >>>>
>>> >>>> server.2=192.168.100.55:2888:3888 (master2 on network A)
>>> >>>>
>>> >>>> server.3=131.154.xxx.xxx:2888:3888 (Master3 on network B, i have set
>>> >>>> floating IP)
>>> >>>>
>>> >>>> *etc/mesos/zk*
>>> >>>>
>>> >>>> zk://192.168.100.54:2181,192.168.100.55:2181
>>> ,131.154.xxx.xxx:2181/mesos
>>> >>>>
>>> >>>> NetorkB:
>>> >>>>
>>> >>>> */etc/zookeeper/conf/zoo.cfg: at the end of the file:*
>>> >>>>
>>> >>>> server.1=131.154.96.27:2888:3888 (master1 on network A, i have set
>>> >>> floating
>>> >>>> IP)
>>> >>>>
>>> >>>> server.2=131.154.96.32:2888:3888 (master2 on network A, i have set
>>> >>> floating
>>> >>>> IP)
>>> >>>>
>>> >>>> server.3=192.168.10.11:2888:3888 (Master3 on network B)
>>> >>>>
>>> >>>>
>>> >>>> *etc/mesos/zk:*
>>> >>>>
>>> >>>> zk://131.154.zzz.zzz:2181,131.154.yyy.yyy:2181,
>>> 192.168.10.11:2181/mesos
>>> >>>>
>>> >>>>
>>> >>>> the 3 masters seems to work fine, if i stop mesos-master service on
>>> one
>>> >>> of
>>> >>>> them, there is the rielection, so they are behaving as one single
>>> >>> cluster
>>> >>>> with 3 masters.
>>> >>>> I have no problems with masters, but with slaves.
>>> >>>> I have currenty set up slaves setting the /etc/mesos/zk exactly as i
>>> >>> shown
>>> >>>> above in a coherent way.
>>> >>>>
>>> >>>> Now the leader s one master which is on the Network A, and only the
>>> >>> slaves
>>> >>>> on Network A can connect to it, but i need to connect also the
>>> slave on
>>> >>> the
>>> >>>> other network.
>>> >>>> Do you have suggestions?
>>> >>>
>>> >>>
>>> >>
>>>
>>>
>>
>

Re: Zookeeper mesos-master on different network

Posted by Stefano Bianchi <ja...@gmail.com>.

However now i perceive a problem with masters.
If i turn off one master on Network A the the master on network B is
elected but after a minute it will disconnect, coming back to the original
one.

2016-04-14 16:26 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:

> on openstack security group the ssh port is open.
>
>
> 2016-04-14 16:24 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
>
>> Is it an indication that the SSH port is open and the others aren't?
>>
>> -Flavio
>>
>> > On 14 Apr 2016, at 15:10, Stefano Bianchi <ja...@gmail.com> wrote:
>> >
>> > I tried with telnet and i have connection timed out, but i am able to
>> > connect trough SSH
>> >
>> > 2016-04-14 16:05 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>> >
>> >> Thanks for your reply Flavio.
>> >> Actually, i don't have a DNS, so i am foced to type hosts file, in
>> which i
>> >> have set all the IP addrsses.
>> >> Of course for the note in Network B i have set the Floating IP of the
>> >> other 2 slaves in network A associated to their hostname. Actually i
>> don't
>> >> know if it is correct, but at least if i make a ping from the slave in
>> >> Network B to a slave in A i obtain replies. and vice versa.
>> >>
>> >> 2016-04-14 15:55 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
>> >>
>> >>> Have you made sure that a slave in net B is able to telnet or ssh to
>> the
>> >>> leader machine in net A? Is it possible that the client port is
>> blocker
>> >>> from B to A?
>> >>>
>> >>> -Flavio
>> >>>
>> >>>
>> >>>> On 14 Apr 2016, at 14:09, Stefano Bianchi <ja...@gmail.com>
>> wrote:
>> >>>>
>> >>>> Hi all
>> >>>> i'm working on OpenStack and i have build come virtual machines and 2
>> >>>> different networks with it.
>> >>>> I have set two mesos clusters:
>> >>>>
>> >>>> NetworkA:
>> >>>> 2 mesos master
>> >>>> 2 mesos slaves
>> >>>>
>> >>>> NetworkB:
>> >>>> 1 mesos master
>> >>>> 1 mesos slave
>> >>>>
>> >>>> I should try to make and interconnection between these two clusters.
>> >>>>
>> >>>> I have set zookeeper configurations such that all 3 masters are
>> >>> competing
>> >>>> for he leadership. I show you the main configurations:
>> >>>>
>> >>>> NetworkA on both 2 masters:
>> >>>>
>> >>>> */etc/zookeeper/conf/zoo.cfg *: at the end of the file
>> >>>>
>> >>>> server.1=192.168.100.54:2888:3888 (master1 on network A)
>> >>>>
>> >>>> server.2=192.168.100.55:2888:3888 (master2 on network A)
>> >>>>
>> >>>> server.3=131.154.xxx.xxx:2888:3888 (Master3 on network B, i have set
>> >>>> floating IP)
>> >>>>
>> >>>> *etc/mesos/zk*
>> >>>>
>> >>>> zk://192.168.100.54:2181,192.168.100.55:2181
>> ,131.154.xxx.xxx:2181/mesos
>> >>>>
>> >>>> NetorkB:
>> >>>>
>> >>>> */etc/zookeeper/conf/zoo.cfg: at the end of the file:*
>> >>>>
>> >>>> server.1=131.154.96.27:2888:3888 (master1 on network A, i have set
>> >>> floating
>> >>>> IP)
>> >>>>
>> >>>> server.2=131.154.96.32:2888:3888 (master2 on network A, i have set
>> >>> floating
>> >>>> IP)
>> >>>>
>> >>>> server.3=192.168.10.11:2888:3888 (Master3 on network B)
>> >>>>
>> >>>>
>> >>>> *etc/mesos/zk:*
>> >>>>
>> >>>> zk://131.154.zzz.zzz:2181,131.154.yyy.yyy:2181,
>> 192.168.10.11:2181/mesos
>> >>>>
>> >>>>
>> >>>> the 3 masters seems to work fine, if i stop mesos-master service on
>> one
>> >>> of
>> >>>> them, there is the rielection, so they are behaving as one single
>> >>> cluster
>> >>>> with 3 masters.
>> >>>> I have no problems with masters, but with slaves.
>> >>>> I have currenty set up slaves setting the /etc/mesos/zk exactly as i
>> >>> shown
>> >>>> above in a coherent way.
>> >>>>
>> >>>> Now the leader s one master which is on the Network A, and only the
>> >>> slaves
>> >>>> on Network A can connect to it, but i need to connect also the slave
>> on
>> >>> the
>> >>>> other network.
>> >>>> Do you have suggestions?
>> >>>
>> >>>
>> >>
>>
>>
>

Re: Zookeeper mesos-master on different network

Posted by Stefano Bianchi <ja...@gmail.com>.

on openstack security group the ssh port is open.


2016-04-14 16:24 GMT+02:00 Flavio Junqueira <fp...@apache.org>:

> Is it an indication that the SSH port is open and the others aren't?
>
> -Flavio
>
> > On 14 Apr 2016, at 15:10, Stefano Bianchi <ja...@gmail.com> wrote:
> >
> > I tried with telnet and i have connection timed out, but i am able to
> > connect trough SSH
> >
> > 2016-04-14 16:05 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
> >
> >> Thanks for your reply Flavio.
> >> Actually, i don't have a DNS, so i am foced to type hosts file, in
> which i
> >> have set all the IP addrsses.
> >> Of course for the note in Network B i have set the Floating IP of the
> >> other 2 slaves in network A associated to their hostname. Actually i
> don't
> >> know if it is correct, but at least if i make a ping from the slave in
> >> Network B to a slave in A i obtain replies. and vice versa.
> >>
> >> 2016-04-14 15:55 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
> >>
> >>> Have you made sure that a slave in net B is able to telnet or ssh to
> the
> >>> leader machine in net A? Is it possible that the client port is blocker
> >>> from B to A?
> >>>
> >>> -Flavio
> >>>
> >>>
> >>>> On 14 Apr 2016, at 14:09, Stefano Bianchi <ja...@gmail.com>
> wrote:
> >>>>
> >>>> Hi all
> >>>> i'm working on OpenStack and i have build come virtual machines and 2
> >>>> different networks with it.
> >>>> I have set two mesos clusters:
> >>>>
> >>>> NetworkA:
> >>>> 2 mesos master
> >>>> 2 mesos slaves
> >>>>
> >>>> NetworkB:
> >>>> 1 mesos master
> >>>> 1 mesos slave
> >>>>
> >>>> I should try to make and interconnection between these two clusters.
> >>>>
> >>>> I have set zookeeper configurations such that all 3 masters are
> >>> competing
> >>>> for he leadership. I show you the main configurations:
> >>>>
> >>>> NetworkA on both 2 masters:
> >>>>
> >>>> */etc/zookeeper/conf/zoo.cfg *: at the end of the file
> >>>>
> >>>> server.1=192.168.100.54:2888:3888 (master1 on network A)
> >>>>
> >>>> server.2=192.168.100.55:2888:3888 (master2 on network A)
> >>>>
> >>>> server.3=131.154.xxx.xxx:2888:3888 (Master3 on network B, i have set
> >>>> floating IP)
> >>>>
> >>>> *etc/mesos/zk*
> >>>>
> >>>> zk://192.168.100.54:2181,192.168.100.55:2181
> ,131.154.xxx.xxx:2181/mesos
> >>>>
> >>>> NetorkB:
> >>>>
> >>>> */etc/zookeeper/conf/zoo.cfg: at the end of the file:*
> >>>>
> >>>> server.1=131.154.96.27:2888:3888 (master1 on network A, i have set
> >>> floating
> >>>> IP)
> >>>>
> >>>> server.2=131.154.96.32:2888:3888 (master2 on network A, i have set
> >>> floating
> >>>> IP)
> >>>>
> >>>> server.3=192.168.10.11:2888:3888 (Master3 on network B)
> >>>>
> >>>>
> >>>> *etc/mesos/zk:*
> >>>>
> >>>> zk://131.154.zzz.zzz:2181,131.154.yyy.yyy:2181,
> 192.168.10.11:2181/mesos
> >>>>
> >>>>
> >>>> the 3 masters seems to work fine, if i stop mesos-master service on
> one
> >>> of
> >>>> them, there is the rielection, so they are behaving as one single
> >>> cluster
> >>>> with 3 masters.
> >>>> I have no problems with masters, but with slaves.
> >>>> I have currenty set up slaves setting the /etc/mesos/zk exactly as i
> >>> shown
> >>>> above in a coherent way.
> >>>>
> >>>> Now the leader s one master which is on the Network A, and only the
> >>> slaves
> >>>> on Network A can connect to it, but i need to connect also the slave
> on
> >>> the
> >>>> other network.
> >>>> Do you have suggestions?
> >>>
> >>>
> >>
>
>

Re: Zookeeper mesos-master on different network

Posted by Flavio Junqueira <fp...@apache.org>.

Is it an indication that the SSH port is open and the others aren't?

-Flavio

> On 14 Apr 2016, at 15:10, Stefano Bianchi <ja...@gmail.com> wrote:
> 
> I tried with telnet and i have connection timed out, but i am able to
> connect trough SSH
> 
> 2016-04-14 16:05 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
> 
>> Thanks for your reply Flavio.
>> Actually, i don't have a DNS, so i am foced to type hosts file, in which i
>> have set all the IP addrsses.
>> Of course for the note in Network B i have set the Floating IP of the
>> other 2 slaves in network A associated to their hostname. Actually i don't
>> know if it is correct, but at least if i make a ping from the slave in
>> Network B to a slave in A i obtain replies. and vice versa.
>> 
>> 2016-04-14 15:55 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
>> 
>>> Have you made sure that a slave in net B is able to telnet or ssh to the
>>> leader machine in net A? Is it possible that the client port is blocker
>>> from B to A?
>>> 
>>> -Flavio
>>> 
>>> 
>>>> On 14 Apr 2016, at 14:09, Stefano Bianchi <ja...@gmail.com> wrote:
>>>> 
>>>> Hi all
>>>> i'm working on OpenStack and i have build come virtual machines and 2
>>>> different networks with it.
>>>> I have set two mesos clusters:
>>>> 
>>>> NetworkA:
>>>> 2 mesos master
>>>> 2 mesos slaves
>>>> 
>>>> NetworkB:
>>>> 1 mesos master
>>>> 1 mesos slave
>>>> 
>>>> I should try to make and interconnection between these two clusters.
>>>> 
>>>> I have set zookeeper configurations such that all 3 masters are
>>> competing
>>>> for he leadership. I show you the main configurations:
>>>> 
>>>> NetworkA on both 2 masters:
>>>> 
>>>> */etc/zookeeper/conf/zoo.cfg *: at the end of the file
>>>> 
>>>> server.1=192.168.100.54:2888:3888 (master1 on network A)
>>>> 
>>>> server.2=192.168.100.55:2888:3888 (master2 on network A)
>>>> 
>>>> server.3=131.154.xxx.xxx:2888:3888 (Master3 on network B, i have set
>>>> floating IP)
>>>> 
>>>> *etc/mesos/zk*
>>>> 
>>>> zk://192.168.100.54:2181,192.168.100.55:2181,131.154.xxx.xxx:2181/mesos
>>>> 
>>>> NetorkB:
>>>> 
>>>> */etc/zookeeper/conf/zoo.cfg: at the end of the file:*
>>>> 
>>>> server.1=131.154.96.27:2888:3888 (master1 on network A, i have set
>>> floating
>>>> IP)
>>>> 
>>>> server.2=131.154.96.32:2888:3888 (master2 on network A, i have set
>>> floating
>>>> IP)
>>>> 
>>>> server.3=192.168.10.11:2888:3888 (Master3 on network B)
>>>> 
>>>> 
>>>> *etc/mesos/zk:*
>>>> 
>>>> zk://131.154.zzz.zzz:2181,131.154.yyy.yyy:2181,192.168.10.11:2181/mesos
>>>> 
>>>> 
>>>> the 3 masters seems to work fine, if i stop mesos-master service on one
>>> of
>>>> them, there is the rielection, so they are behaving as one single
>>> cluster
>>>> with 3 masters.
>>>> I have no problems with masters, but with slaves.
>>>> I have currenty set up slaves setting the /etc/mesos/zk exactly as i
>>> shown
>>>> above in a coherent way.
>>>> 
>>>> Now the leader s one master which is on the Network A, and only the
>>> slaves
>>>> on Network A can connect to it, but i need to connect also the slave on
>>> the
>>>> other network.
>>>> Do you have suggestions?
>>> 
>>> 
>>

Re: Zookeeper mesos-master on different network

Posted by Stefano Bianchi <ja...@gmail.com>.

Flavio in your opinion my setup up of zookeeper files is correct?


2016-04-14 16:16 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:

> Nothing also telnet works.
>
> 2016-04-14 16:10 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>
>> I tried with telnet and i have connection timed out, but i am able to
>> connect trough SSH
>>
>> 2016-04-14 16:05 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>>
>>> Thanks for your reply Flavio.
>>> Actually, i don't have a DNS, so i am foced to type hosts file, in which
>>> i have set all the IP addrsses.
>>> Of course for the note in Network B i have set the Floating IP of the
>>> other 2 slaves in network A associated to their hostname. Actually i don't
>>> know if it is correct, but at least if i make a ping from the slave in
>>> Network B to a slave in A i obtain replies. and vice versa.
>>>
>>> 2016-04-14 15:55 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
>>>
>>>> Have you made sure that a slave in net B is able to telnet or ssh to
>>>> the leader machine in net A? Is it possible that the client port is blocker
>>>> from B to A?
>>>>
>>>> -Flavio
>>>>
>>>>
>>>> > On 14 Apr 2016, at 14:09, Stefano Bianchi <ja...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hi all
>>>> > i'm working on OpenStack and i have build come virtual machines and 2
>>>> > different networks with it.
>>>> > I have set two mesos clusters:
>>>> >
>>>> > NetworkA:
>>>> > 2 mesos master
>>>> > 2 mesos slaves
>>>> >
>>>> > NetworkB:
>>>> > 1 mesos master
>>>> > 1 mesos slave
>>>> >
>>>> > I should try to make and interconnection between these two clusters.
>>>> >
>>>> > I have set zookeeper configurations such that all 3 masters are
>>>> competing
>>>> > for he leadership. I show you the main configurations:
>>>> >
>>>> > NetworkA on both 2 masters:
>>>> >
>>>> > */etc/zookeeper/conf/zoo.cfg *: at the end of the file
>>>> >
>>>> > server.1=192.168.100.54:2888:3888 (master1 on network A)
>>>> >
>>>> > server.2=192.168.100.55:2888:3888 (master2 on network A)
>>>> >
>>>> > server.3=131.154.xxx.xxx:2888:3888 (Master3 on network B, i have set
>>>> > floating IP)
>>>> >
>>>> > *etc/mesos/zk*
>>>> >
>>>> > zk://192.168.100.54:2181,192.168.100.55:2181
>>>> ,131.154.xxx.xxx:2181/mesos
>>>> >
>>>> > NetorkB:
>>>> >
>>>> > */etc/zookeeper/conf/zoo.cfg: at the end of the file:*
>>>> >
>>>> > server.1=131.154.96.27:2888:3888 (master1 on network A, i have set
>>>> floating
>>>> > IP)
>>>> >
>>>> > server.2=131.154.96.32:2888:3888 (master2 on network A, i have set
>>>> floating
>>>> > IP)
>>>> >
>>>> > server.3=192.168.10.11:2888:3888 (Master3 on network B)
>>>> >
>>>> >
>>>> > *etc/mesos/zk:*
>>>> >
>>>> > zk://131.154.zzz.zzz:2181,131.154.yyy.yyy:2181,
>>>> 192.168.10.11:2181/mesos
>>>> >
>>>> >
>>>> > the 3 masters seems to work fine, if i stop mesos-master service on
>>>> one of
>>>> > them, there is the rielection, so they are behaving as one single
>>>> cluster
>>>> > with 3 masters.
>>>> > I have no problems with masters, but with slaves.
>>>> > I have currenty set up slaves setting the /etc/mesos/zk exactly as i
>>>> shown
>>>> > above in a coherent way.
>>>> >
>>>> > Now the leader s one master which is on the Network A, and only the
>>>> slaves
>>>> > on Network A can connect to it, but i need to connect also the slave
>>>> on the
>>>> > other network.
>>>> > Do you have suggestions?
>>>>
>>>>
>>>
>>
>

Re: Zookeeper mesos-master on different network

Posted by Stefano Bianchi <ja...@gmail.com>.

Nothing also telnet works.

2016-04-14 16:10 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:

> I tried with telnet and i have connection timed out, but i am able to
> connect trough SSH
>
> 2016-04-14 16:05 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:
>
>> Thanks for your reply Flavio.
>> Actually, i don't have a DNS, so i am foced to type hosts file, in which
>> i have set all the IP addrsses.
>> Of course for the note in Network B i have set the Floating IP of the
>> other 2 slaves in network A associated to their hostname. Actually i don't
>> know if it is correct, but at least if i make a ping from the slave in
>> Network B to a slave in A i obtain replies. and vice versa.
>>
>> 2016-04-14 15:55 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
>>
>>> Have you made sure that a slave in net B is able to telnet or ssh to the
>>> leader machine in net A? Is it possible that the client port is blocker
>>> from B to A?
>>>
>>> -Flavio
>>>
>>>
>>> > On 14 Apr 2016, at 14:09, Stefano Bianchi <ja...@gmail.com>
>>> wrote:
>>> >
>>> > Hi all
>>> > i'm working on OpenStack and i have build come virtual machines and 2
>>> > different networks with it.
>>> > I have set two mesos clusters:
>>> >
>>> > NetworkA:
>>> > 2 mesos master
>>> > 2 mesos slaves
>>> >
>>> > NetworkB:
>>> > 1 mesos master
>>> > 1 mesos slave
>>> >
>>> > I should try to make and interconnection between these two clusters.
>>> >
>>> > I have set zookeeper configurations such that all 3 masters are
>>> competing
>>> > for he leadership. I show you the main configurations:
>>> >
>>> > NetworkA on both 2 masters:
>>> >
>>> > */etc/zookeeper/conf/zoo.cfg *: at the end of the file
>>> >
>>> > server.1=192.168.100.54:2888:3888 (master1 on network A)
>>> >
>>> > server.2=192.168.100.55:2888:3888 (master2 on network A)
>>> >
>>> > server.3=131.154.xxx.xxx:2888:3888 (Master3 on network B, i have set
>>> > floating IP)
>>> >
>>> > *etc/mesos/zk*
>>> >
>>> > zk://192.168.100.54:2181,192.168.100.55:2181
>>> ,131.154.xxx.xxx:2181/mesos
>>> >
>>> > NetorkB:
>>> >
>>> > */etc/zookeeper/conf/zoo.cfg: at the end of the file:*
>>> >
>>> > server.1=131.154.96.27:2888:3888 (master1 on network A, i have set
>>> floating
>>> > IP)
>>> >
>>> > server.2=131.154.96.32:2888:3888 (master2 on network A, i have set
>>> floating
>>> > IP)
>>> >
>>> > server.3=192.168.10.11:2888:3888 (Master3 on network B)
>>> >
>>> >
>>> > *etc/mesos/zk:*
>>> >
>>> > zk://131.154.zzz.zzz:2181,131.154.yyy.yyy:2181,
>>> 192.168.10.11:2181/mesos
>>> >
>>> >
>>> > the 3 masters seems to work fine, if i stop mesos-master service on
>>> one of
>>> > them, there is the rielection, so they are behaving as one single
>>> cluster
>>> > with 3 masters.
>>> > I have no problems with masters, but with slaves.
>>> > I have currenty set up slaves setting the /etc/mesos/zk exactly as i
>>> shown
>>> > above in a coherent way.
>>> >
>>> > Now the leader s one master which is on the Network A, and only the
>>> slaves
>>> > on Network A can connect to it, but i need to connect also the slave
>>> on the
>>> > other network.
>>> > Do you have suggestions?
>>>
>>>
>>
>

Re: Zookeeper mesos-master on different network

Posted by Stefano Bianchi <ja...@gmail.com>.

I tried with telnet and i have connection timed out, but i am able to
connect trough SSH

2016-04-14 16:05 GMT+02:00 Stefano Bianchi <ja...@gmail.com>:

> Thanks for your reply Flavio.
> Actually, i don't have a DNS, so i am foced to type hosts file, in which i
> have set all the IP addrsses.
> Of course for the note in Network B i have set the Floating IP of the
> other 2 slaves in network A associated to their hostname. Actually i don't
> know if it is correct, but at least if i make a ping from the slave in
> Network B to a slave in A i obtain replies. and vice versa.
>
> 2016-04-14 15:55 GMT+02:00 Flavio Junqueira <fp...@apache.org>:
>
>> Have you made sure that a slave in net B is able to telnet or ssh to the
>> leader machine in net A? Is it possible that the client port is blocker
>> from B to A?
>>
>> -Flavio
>>
>>
>> > On 14 Apr 2016, at 14:09, Stefano Bianchi <ja...@gmail.com> wrote:
>> >
>> > Hi all
>> > i'm working on OpenStack and i have build come virtual machines and 2
>> > different networks with it.
>> > I have set two mesos clusters:
>> >
>> > NetworkA:
>> > 2 mesos master
>> > 2 mesos slaves
>> >
>> > NetworkB:
>> > 1 mesos master
>> > 1 mesos slave
>> >
>> > I should try to make and interconnection between these two clusters.
>> >
>> > I have set zookeeper configurations such that all 3 masters are
>> competing
>> > for he leadership. I show you the main configurations:
>> >
>> > NetworkA on both 2 masters:
>> >
>> > */etc/zookeeper/conf/zoo.cfg *: at the end of the file
>> >
>> > server.1=192.168.100.54:2888:3888 (master1 on network A)
>> >
>> > server.2=192.168.100.55:2888:3888 (master2 on network A)
>> >
>> > server.3=131.154.xxx.xxx:2888:3888 (Master3 on network B, i have set
>> > floating IP)
>> >
>> > *etc/mesos/zk*
>> >
>> > zk://192.168.100.54:2181,192.168.100.55:2181,131.154.xxx.xxx:2181/mesos
>> >
>> > NetorkB:
>> >
>> > */etc/zookeeper/conf/zoo.cfg: at the end of the file:*
>> >
>> > server.1=131.154.96.27:2888:3888 (master1 on network A, i have set
>> floating
>> > IP)
>> >
>> > server.2=131.154.96.32:2888:3888 (master2 on network A, i have set
>> floating
>> > IP)
>> >
>> > server.3=192.168.10.11:2888:3888 (Master3 on network B)
>> >
>> >
>> > *etc/mesos/zk:*
>> >
>> > zk://131.154.zzz.zzz:2181,131.154.yyy.yyy:2181,192.168.10.11:2181/mesos
>> >
>> >
>> > the 3 masters seems to work fine, if i stop mesos-master service on one
>> of
>> > them, there is the rielection, so they are behaving as one single
>> cluster
>> > with 3 masters.
>> > I have no problems with masters, but with slaves.
>> > I have currenty set up slaves setting the /etc/mesos/zk exactly as i
>> shown
>> > above in a coherent way.
>> >
>> > Now the leader s one master which is on the Network A, and only the
>> slaves
>> > on Network A can connect to it, but i need to connect also the slave on
>> the
>> > other network.
>> > Do you have suggestions?
>>
>>
>

Re: Zookeeper mesos-master on different network

Posted by Stefano Bianchi <ja...@gmail.com>.

Thanks for your reply Flavio.
Actually, i don't have a DNS, so i am foced to type hosts file, in which i
have set all the IP addrsses.
Of course for the note in Network B i have set the Floating IP of the other
2 slaves in network A associated to their hostname. Actually i don't know
if it is correct, but at least if i make a ping from the slave in Network B
to a slave in A i obtain replies. and vice versa.

2016-04-14 15:55 GMT+02:00 Flavio Junqueira <fp...@apache.org>:

> Have you made sure that a slave in net B is able to telnet or ssh to the
> leader machine in net A? Is it possible that the client port is blocker
> from B to A?
>
> -Flavio
>
>
> > On 14 Apr 2016, at 14:09, Stefano Bianchi <ja...@gmail.com> wrote:
> >
> > Hi all
> > i'm working on OpenStack and i have build come virtual machines and 2
> > different networks with it.
> > I have set two mesos clusters:
> >
> > NetworkA:
> > 2 mesos master
> > 2 mesos slaves
> >
> > NetworkB:
> > 1 mesos master
> > 1 mesos slave
> >
> > I should try to make and interconnection between these two clusters.
> >
> > I have set zookeeper configurations such that all 3 masters are competing
> > for he leadership. I show you the main configurations:
> >
> > NetworkA on both 2 masters:
> >
> > */etc/zookeeper/conf/zoo.cfg *: at the end of the file
> >
> > server.1=192.168.100.54:2888:3888 (master1 on network A)
> >
> > server.2=192.168.100.55:2888:3888 (master2 on network A)
> >
> > server.3=131.154.xxx.xxx:2888:3888 (Master3 on network B, i have set
> > floating IP)
> >
> > *etc/mesos/zk*
> >
> > zk://192.168.100.54:2181,192.168.100.55:2181,131.154.xxx.xxx:2181/mesos
> >
> > NetorkB:
> >
> > */etc/zookeeper/conf/zoo.cfg: at the end of the file:*
> >
> > server.1=131.154.96.27:2888:3888 (master1 on network A, i have set
> floating
> > IP)
> >
> > server.2=131.154.96.32:2888:3888 (master2 on network A, i have set
> floating
> > IP)
> >
> > server.3=192.168.10.11:2888:3888 (Master3 on network B)
> >
> >
> > *etc/mesos/zk:*
> >
> > zk://131.154.zzz.zzz:2181,131.154.yyy.yyy:2181,192.168.10.11:2181/mesos
> >
> >
> > the 3 masters seems to work fine, if i stop mesos-master service on one
> of
> > them, there is the rielection, so they are behaving as one single cluster
> > with 3 masters.
> > I have no problems with masters, but with slaves.
> > I have currenty set up slaves setting the /etc/mesos/zk exactly as i
> shown
> > above in a coherent way.
> >
> > Now the leader s one master which is on the Network A, and only the
> slaves
> > on Network A can connect to it, but i need to connect also the slave on
> the
> > other network.
> > Do you have suggestions?
>
>

Re: Zookeeper mesos-master on different network

Posted by Flavio Junqueira <fp...@apache.org>.

Have you made sure that a slave in net B is able to telnet or ssh to the leader machine in net A? Is it possible that the client port is blocker from B to A?

-Flavio


> On 14 Apr 2016, at 14:09, Stefano Bianchi <ja...@gmail.com> wrote:
> 
> Hi all
> i'm working on OpenStack and i have build come virtual machines and 2
> different networks with it.
> I have set two mesos clusters:
> 
> NetworkA:
> 2 mesos master
> 2 mesos slaves
> 
> NetworkB:
> 1 mesos master
> 1 mesos slave
> 
> I should try to make and interconnection between these two clusters.
> 
> I have set zookeeper configurations such that all 3 masters are competing
> for he leadership. I show you the main configurations:
> 
> NetworkA on both 2 masters:
> 
> */etc/zookeeper/conf/zoo.cfg *: at the end of the file
> 
> server.1=192.168.100.54:2888:3888 (master1 on network A)
> 
> server.2=192.168.100.55:2888:3888 (master2 on network A)
> 
> server.3=131.154.xxx.xxx:2888:3888 (Master3 on network B, i have set
> floating IP)
> 
> *etc/mesos/zk*
> 
> zk://192.168.100.54:2181,192.168.100.55:2181,131.154.xxx.xxx:2181/mesos
> 
> NetorkB:
> 
> */etc/zookeeper/conf/zoo.cfg: at the end of the file:*
> 
> server.1=131.154.96.27:2888:3888 (master1 on network A, i have set floating
> IP)
> 
> server.2=131.154.96.32:2888:3888 (master2 on network A, i have set floating
> IP)
> 
> server.3=192.168.10.11:2888:3888 (Master3 on network B)
> 
> 
> *etc/mesos/zk:*
> 
> zk://131.154.zzz.zzz:2181,131.154.yyy.yyy:2181,192.168.10.11:2181/mesos
> 
> 
> the 3 masters seems to work fine, if i stop mesos-master service on one of
> them, there is the rielection, so they are behaving as one single cluster
> with 3 masters.
> I have no problems with masters, but with slaves.
> I have currenty set up slaves setting the /etc/mesos/zk exactly as i shown
> above in a coherent way.
> 
> Now the leader s one master which is on the Network A, and only the slaves
> on Network A can connect to it, but i need to connect also the slave on the
> other network.
> Do you have suggestions?