You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Jordan Zimmerman <jo...@jordanzimmerman.com> on 2014/06/08 06:34:01 UTC

Slaves in a different dc/region?

Has anyone tried running a slave in a different datacenter than the master? It seems the slaves connect to ZooKeeper. Is that correct? If so, cross-data center might not work.

Thanks!



Re: Slaves in a different dc/region?

Posted by Konrad Scherer <ko...@windriver.com>.
On 06/08/2014 12:34 AM, Jordan Zimmerman wrote:
> Has anyone tried running a slave in a different datacenter than the master? It
> seems the slaves connect to ZooKeeper. Is that correct? If so, cross-data center
> might not work.

I am currently running a prototype mesos cluster with slaves in one datacenter 
and the mesos master in another. I haven't had problems with connections across 
the WAN.

I am planning to setup a 3 node zookeeper cluster in the mesos master DC and a 
zookeeper observer instance in the remote DC. Theoretically this should further 
reduce the connections and bandwidth across the WAN. Zookeeper observer 
instances is a new 3.4 feature. If someone has experience with multi datacenter 
zookeeper and mesos, I would be grateful for any recommendations.

Thanks

-- 
Konrad Scherer, MTS, Linux Products Group, Wind River

Re: Slaves in a different dc/region?

Posted by Tomas Barton <ba...@gmail.com>.
Each Mesos slave is keeping a session, that's correct. When the connection
is lost, slave will simply reconnect

ZOO_ERROR@handle_socket_error_msg@1643: Socket [192.168.1.1:2181] zk
retcode=-7, errno=110(Connection timed out): connection to 192.168.1.1:2181
timed out (exceeded timeout by 1ms)
I0606 06:25:14.611565 19666 group.cpp:415] Lost connection to ZooKeeper,
attempting to reconnect ...
2014-06-06 06:25:17,947:19643(0x7f98f9e15700):ZOO_WARN@zookeeper_interest@1557:
Exceeded deadline by 3337ms
2014-06-06 06:25:17,950:19643(0x7f98f9e15700):ZOO_INFO@check_events@1703:
initiated connection to server [192.168.1.1:2181]
2014-06-06 06:25:18,381:19643(0x7f98f9e15700):ZOO_INFO@check_events@1750:
session establishment complete on server [192.168.1.1:2181],
sessionId=0x246349f15510076, negotiated timeout=10000
I0606 06:25:18.381938 19667 group.cpp:310] Group process ((5)@
192.168.1.10:5051) reconnected to ZooKeeper


There ain't much information stored in ZooKeeper, it's pretty much just the
IP address of the master node. So, the communication won't be so intensive.
However the slave node have to send updates of assigned task's state to
Mesos master. If computing each task takes let's say few minutes and
communication delay will be 100ms it should be fine.



On 8 June 2014 17:19, David Greenberg <ds...@gmail.com> wrote:

> I believe that slaves only use ZK to discover the masters initially--they
> directly communicate with them from then on, so the problem of WAN
> latencies is somewhat mitigated.
>
>
> On Sun, Jun 8, 2014 at 10:45 AM, Jordan Zimmerman <
> jordan@jordanzimmerman.com> wrote:
>
>> But if the slaves try to maintain a ZooKeeper connection there will be
>> instability. WANs aren’t very reliable and ZK clients maintain a session.
>> Do the slaves query only? What would happen if the slave lost connection to
>> ZooKeeper?
>>
>> -Jordan
>>
>>
>> From: Tomas Barton barton.tomas@gmail.com
>> Reply: user@mesos.apache.org user@mesos.apache.org
>> Date: June 8, 2014 at 8:06:47 AM
>> To: user user@mesos.apache.org
>> Subject:  Re: Slaves in a different dc/region?
>>
>>  Hi,
>>
>> generally it should work. Mesos slave gets from ZooKeeper current master
>> IP address. ZooKeepers should be deployed in one datacenter (usually 3 or 5
>> instances). If you will run on Mesos
>> long term tasks it should be fine. If you would deploy e.g. Spark which
>> tends to have quite short tasks (let's say few hundreds milliseconds), the
>> computations might be slower due to longer communication.
>>
>> It really depends on your use case, it might be good idea to have a Mesos
>> cluster in each datacenter. However you might try adjusting schedulers so
>> that they would respect slaves location, e.g. prefer allocating task from
>> one framework at the same datacenter, if the resources are available.
>>
>> Tomas
>>
>>
>>
>> On 8 June 2014 06:34, Jordan Zimmerman <jo...@jordanzimmerman.com>
>> wrote:
>>
>>>  Has anyone tried running a slave in a different datacenter than the
>>> master? It seems the slaves connect to ZooKeeper. Is that correct? If so,
>>> cross-data center might not work.
>>>
>>>  Thanks!
>>>
>>>
>>>
>>
>

Re: Slaves in a different dc/region?

Posted by David Greenberg <ds...@gmail.com>.
I believe that slaves only use ZK to discover the masters initially--they
directly communicate with them from then on, so the problem of WAN
latencies is somewhat mitigated.


On Sun, Jun 8, 2014 at 10:45 AM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> But if the slaves try to maintain a ZooKeeper connection there will be
> instability. WANs aren’t very reliable and ZK clients maintain a session.
> Do the slaves query only? What would happen if the slave lost connection to
> ZooKeeper?
>
> -Jordan
>
>
> From: Tomas Barton barton.tomas@gmail.com
> Reply: user@mesos.apache.org user@mesos.apache.org
> Date: June 8, 2014 at 8:06:47 AM
> To: user user@mesos.apache.org
> Subject:  Re: Slaves in a different dc/region?
>
>  Hi,
>
> generally it should work. Mesos slave gets from ZooKeeper current master
> IP address. ZooKeepers should be deployed in one datacenter (usually 3 or 5
> instances). If you will run on Mesos
> long term tasks it should be fine. If you would deploy e.g. Spark which
> tends to have quite short tasks (let's say few hundreds milliseconds), the
> computations might be slower due to longer communication.
>
> It really depends on your use case, it might be good idea to have a Mesos
> cluster in each datacenter. However you might try adjusting schedulers so
> that they would respect slaves location, e.g. prefer allocating task from
> one framework at the same datacenter, if the resources are available.
>
> Tomas
>
>
>
> On 8 June 2014 06:34, Jordan Zimmerman <jo...@jordanzimmerman.com> wrote:
>
>>  Has anyone tried running a slave in a different datacenter than the
>> master? It seems the slaves connect to ZooKeeper. Is that correct? If so,
>> cross-data center might not work.
>>
>>  Thanks!
>>
>>
>>
>

Re: Slaves in a different dc/region?

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
But if the slaves try to maintain a ZooKeeper connection there will be instability. WANs aren’t very reliable and ZK clients maintain a session. Do the slaves query only? What would happen if the slave lost connection to ZooKeeper?

-Jordan


From: Tomas Barton barton.tomas@gmail.com
Reply: user@mesos.apache.org user@mesos.apache.org
Date: June 8, 2014 at 8:06:47 AM
To: user user@mesos.apache.org
Subject:  Re: Slaves in a different dc/region?  

Hi,

generally it should work. Mesos slave gets from ZooKeeper current master IP address. ZooKeepers should be deployed in one datacenter (usually 3 or 5 instances). If you will run on Mesos
long term tasks it should be fine. If you would deploy e.g. Spark which tends to have quite short tasks (let's say few hundreds milliseconds), the computations might be slower due to longer communication.

It really depends on your use case, it might be good idea to have a Mesos cluster in each datacenter. However you might try adjusting schedulers so that they would respect slaves location, e.g. prefer allocating task from one framework at the same datacenter, if the resources are available.

Tomas



On 8 June 2014 06:34, Jordan Zimmerman <jo...@jordanzimmerman.com> wrote:
Has anyone tried running a slave in a different datacenter than the master? It seems the slaves connect to ZooKeeper. Is that correct? If so, cross-data center might not work.

Thanks!




Re: Slaves in a different dc/region?

Posted by Tomas Barton <ba...@gmail.com>.
Hi,

generally it should work. Mesos slave gets from ZooKeeper current master IP
address. ZooKeepers should be deployed in one datacenter (usually 3 or 5
instances). If you will run on Mesos
long term tasks it should be fine. If you would deploy e.g. Spark which
tends to have quite short tasks (let's say few hundreds milliseconds), the
computations might be slower due to longer communication.

It really depends on your use case, it might be good idea to have a Mesos
cluster in each datacenter. However you might try adjusting schedulers so
that they would respect slaves location, e.g. prefer allocating task from
one framework at the same datacenter, if the resources are available.

Tomas



On 8 June 2014 06:34, Jordan Zimmerman <jo...@jordanzimmerman.com> wrote:

> Has anyone tried running a slave in a different datacenter than the
> master? It seems the slaves connect to ZooKeeper. Is that correct? If so,
> cross-data center might not work.
>
> Thanks!
>
>
>