You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Raja Nagendra Kumar <Na...@tejasoft.com> on 2011/07/03 04:11:51 UTC

Hadoop Master and Slave Discovery

Hi,

Instead of depending on local syncup to configuration files, would it be a
nice way to adopt JINI Discovery model, where in masters and slaves can
discover each other dynamically through a UDP broadcast/heart beat methods.

This would mean, any machine can come up and say I am a slave and
automatically discover the master and start supporting the master with in <
x seconds.

Regards,
Raja Nagendra Kumar,
C.T.O
www.tejasoft.com
-Hadoop Adoption Consulting


-- 
View this message in context: http://old.nabble.com/Hadoop-Master-and-Slave-Discovery-tp31981952p31981952.html
Sent from the Hadoop core-dev mailing list archive at Nabble.com.

Re: Hadoop Master and Slave Discovery

Posted by Warren Turkal <wt...@penguintechs.org>.

Stable infrastructures require deterministic behavior to be understood. I
believe that mdns limits the determinism of a system by requiring that I
accept that a machine will be picking a random address. When I setup my DCs
I want machines to have a single constant ip address so that I don't have to
do a bunch of work to figure out which machine I am trying to talk to when
things go wrong.

As such, I find it hard to believe that mdns belongs in large DCs.

wt

On Thu, Jul 7, 2011 at 10:17 AM, Eric Yang <er...@gmail.com> wrote:

> Internet Assigned Number Authority has allocated 169.254.1.0 to
> 169.254.254.255 for the propose of communicate between nodes.  This is
> 65024 IP address designed for local area network only.  They are not
> allow to be routed.  Zeroconf is randomly selecting one address out of
> the 65024 available address, and broadcasts ARP message.  If no one is
> using this address, then the machine will use the selected ip address
> and communicate with zeroconf service for name resolution.  If the
> address is already in use, the system restart from scratch to pick
> another address.  Hence, the actual limit is not bound to 1000, but
> 65024.  In real life, it is unlikely to use all 65024 for name
> resolution due to chance of loss packet on modern ethernet (10^-12) or
> delay from repetitive selection of the the same ip address from
> different hosts.  It can easily push the limit to 10,000-20,000 nodes
> without losing reliability in server farm settings.
>
> It would be nice to support both dynamic discovery of master and
> slaves, and preserve the exist configuration style management for EC2
> like deployment.  This is one innovation worth having.
>
> regards,
> Eric
>
> On Wed, Jul 6, 2011 at 5:49 PM, Allen Wittenauer <aw...@apache.org> wrote:
> >
> > On Jul 6, 2011, at 5:05 PM, Eric Yang wrote:
> >
> >> Did you know that almost all linux desktop system comes with avahi
> >> pre-installed and turn on by default?
> >
> >        ... which is why most admins turn those services off by default.
> :)
> >
> >>  What is more interesting is
> >> that there are thousands of those machines broadcast in large
> >> cooperation without anyone noticing them?
> >
> >        That's because many network teams turn off multicast past the
> subnet boundary and many corporate desktops are in class C subnets.  This
> automatically limits the host count down to 200-ish per network.  Usually
> just the unicast traffic is bad enough.  Throwing multicast into the mix
> just makes it worse.
> >
> >> I have recently built a
> >> multicast dns browser and look into the number of machines running in
> >> a large company environment.  The number of desktop, laptop and
> >> printer machines running multicast dns is far exceeding 1000 machines
> >> in the local subnet.
> >
> >        From my understanding of Y!'s network, the few /22's they have
> (which would get you 1022 potential hosts on a subnet) have multicast
> traffic dropped at the router and switch levels.  Additionally, DNS-SD (the
> service discovery portion of mDNS) offers unicast support as well.  So there
> is a very good chance that the traffic you are seeing is from unicast, not
> multicast.
> >
> >        The 1000 number, BTW, comes from Apple.  I'm sure they'd be
> interested in your findings given their role in ZC.
> >
> >        BTW, I'd much rather hear that you set up a /22 with many many
> machines running VMs trying to actually use mDNS for something useful.  A
> service browser really isn't that interesting.
> >
> >> They are all happily working fine without causing any issues.
> >
> >        ... that you know of.  Again, I'm 99% certain that Y! is dropping
> multicast packets into the bit bucket at the switch boundaries.  [I remember
> having this conversation with them when we setup the new data centers.]
> >
> >>  Printer works fine,
> >
> >        Most admins turn SLP and other broadcast services on printers off.
>   For large networks, one usually sees print services enabled via AD or
> master print servers broadcasting the information on the local subnet.  This
> allows a central point of control rather than randomness.   Snow Leopard (I
> don't think Leopard did this) actually tells you where the printer is coming
> from now, so that's handy to see if they are ZC or AD or whatever.
> >
> >> itune sharing from someone
> >> else works fine.
> >
> >        iTunes specifically limits its reach so that it can't extend
> beyond the local subnet and definitely does unicast in addition to ZC, so
> that doesn't really say much of anything, other than potentially
> invalidating your results.
> >
> >>  For some reason, things tend to work better on my
> >> side of universe. :)
> >
> >        I'm sure it does, but not for the reasons you think they do.
> >
> >> Allen, if you want to get stuck on stone age
> >> tools, I won't stop you.
> >>
> >
> >        Multicast has a time and place (mainly for small, non-busy
> networks).  Using it without understanding the network impact is never a
> good idea.
> >
> >        FWIW, I've seen multicast traffic bring down an entire campus of
> tens of thousands of machines due to routers and switches having bugs where
> they didn't subtract from the packet's TTL.  I'm not the only one with these
> types of experiences.  Anything multicast is going to have a very large
> uphill battle for adoption because of these widespread problems.  Many
> network vendors really don't get this one right, for some reason.
>

Re: Hadoop Master and Slave Discovery

Posted by Eric Yang <er...@gmail.com>.

Internet Assigned Number Authority has allocated 169.254.1.0 to
169.254.254.255 for the propose of communicate between nodes.  This is
65024 IP address designed for local area network only.  They are not
allow to be routed.  Zeroconf is randomly selecting one address out of
the 65024 available address, and broadcasts ARP message.  If no one is
using this address, then the machine will use the selected ip address
and communicate with zeroconf service for name resolution.  If the
address is already in use, the system restart from scratch to pick
another address.  Hence, the actual limit is not bound to 1000, but
65024.  In real life, it is unlikely to use all 65024 for name
resolution due to chance of loss packet on modern ethernet (10^-12) or
delay from repetitive selection of the the same ip address from
different hosts.  It can easily push the limit to 10,000-20,000 nodes
without losing reliability in server farm settings.

It would be nice to support both dynamic discovery of master and
slaves, and preserve the exist configuration style management for EC2
like deployment.  This is one innovation worth having.

regards,
Eric

On Wed, Jul 6, 2011 at 5:49 PM, Allen Wittenauer <aw...@apache.org> wrote:
>
> On Jul 6, 2011, at 5:05 PM, Eric Yang wrote:
>
>> Did you know that almost all linux desktop system comes with avahi
>> pre-installed and turn on by default?
>
>        ... which is why most admins turn those services off by default. :)
>
>>  What is more interesting is
>> that there are thousands of those machines broadcast in large
>> cooperation without anyone noticing them?
>
>        That's because many network teams turn off multicast past the subnet boundary and many corporate desktops are in class C subnets.  This automatically limits the host count down to 200-ish per network.  Usually just the unicast traffic is bad enough.  Throwing multicast into the mix just makes it worse.
>
>> I have recently built a
>> multicast dns browser and look into the number of machines running in
>> a large company environment.  The number of desktop, laptop and
>> printer machines running multicast dns is far exceeding 1000 machines
>> in the local subnet.
>
>        From my understanding of Y!'s network, the few /22's they have (which would get you 1022 potential hosts on a subnet) have multicast traffic dropped at the router and switch levels.  Additionally, DNS-SD (the service discovery portion of mDNS) offers unicast support as well.  So there is a very good chance that the traffic you are seeing is from unicast, not multicast.
>
>        The 1000 number, BTW, comes from Apple.  I'm sure they'd be interested in your findings given their role in ZC.
>
>        BTW, I'd much rather hear that you set up a /22 with many many machines running VMs trying to actually use mDNS for something useful.  A service browser really isn't that interesting.
>
>> They are all happily working fine without causing any issues.
>
>        ... that you know of.  Again, I'm 99% certain that Y! is dropping multicast packets into the bit bucket at the switch boundaries.  [I remember having this conversation with them when we setup the new data centers.]
>
>>  Printer works fine,
>
>        Most admins turn SLP and other broadcast services on printers off.   For large networks, one usually sees print services enabled via AD or master print servers broadcasting the information on the local subnet.  This allows a central point of control rather than randomness.   Snow Leopard (I don't think Leopard did this) actually tells you where the printer is coming from now, so that's handy to see if they are ZC or AD or whatever.
>
>> itune sharing from someone
>> else works fine.
>
>        iTunes specifically limits its reach so that it can't extend beyond the local subnet and definitely does unicast in addition to ZC, so that doesn't really say much of anything, other than potentially invalidating your results.
>
>>  For some reason, things tend to work better on my
>> side of universe. :)
>
>        I'm sure it does, but not for the reasons you think they do.
>
>> Allen, if you want to get stuck on stone age
>> tools, I won't stop you.
>>
>
>        Multicast has a time and place (mainly for small, non-busy networks).  Using it without understanding the network impact is never a good idea.
>
>        FWIW, I've seen multicast traffic bring down an entire campus of tens of thousands of machines due to routers and switches having bugs where they didn't subtract from the packet's TTL.  I'm not the only one with these types of experiences.  Anything multicast is going to have a very large uphill battle for adoption because of these widespread problems.  Many network vendors really don't get this one right, for some reason.

Re: Hadoop Master and Slave Discovery

Posted by Warren Turkal <wt...@penguintechs.org>.

Frankly, I agree with Allen's comments.

I think that discovering the zookeeper should be done with a well known DNS
address (e.g. zookeeper.$cluster.prod.example.com). It would be pretty rare
for something like the address of the zookeeper to change in a stable
infrastructure. When it does, DNS can be updated as part of the procedure of
the change.

Using multicast on the other hand introduces a higher barrier to getting a
hadoop cluster running as one must then troubleshoot and multicast issues
that come up.

wt

On Wed, Jul 6, 2011 at 5:49 PM, Allen Wittenauer <aw...@apache.org> wrote:

>
> On Jul 6, 2011, at 5:05 PM, Eric Yang wrote:
>
> > Did you know that almost all linux desktop system comes with avahi
> > pre-installed and turn on by default?
>
>        ... which is why most admins turn those services off by default. :)
>
> >  What is more interesting is
> > that there are thousands of those machines broadcast in large
> > cooperation without anyone noticing them?
>
>        That's because many network teams turn off multicast past the subnet
> boundary and many corporate desktops are in class C subnets.  This
> automatically limits the host count down to 200-ish per network.  Usually
> just the unicast traffic is bad enough.  Throwing multicast into the mix
> just makes it worse.
>
> > I have recently built a
> > multicast dns browser and look into the number of machines running in
> > a large company environment.  The number of desktop, laptop and
> > printer machines running multicast dns is far exceeding 1000 machines
> > in the local subnet.
>
>        From my understanding of Y!'s network, the few /22's they have
> (which would get you 1022 potential hosts on a subnet) have multicast
> traffic dropped at the router and switch levels.  Additionally, DNS-SD (the
> service discovery portion of mDNS) offers unicast support as well.  So there
> is a very good chance that the traffic you are seeing is from unicast, not
> multicast.
>
>        The 1000 number, BTW, comes from Apple.  I'm sure they'd be
> interested in your findings given their role in ZC.
>
>        BTW, I'd much rather hear that you set up a /22 with many many
> machines running VMs trying to actually use mDNS for something useful.  A
> service browser really isn't that interesting.
>
> > They are all happily working fine without causing any issues.
>
>        ... that you know of.  Again, I'm 99% certain that Y! is dropping
> multicast packets into the bit bucket at the switch boundaries.  [I remember
> having this conversation with them when we setup the new data centers.]
>
> >  Printer works fine,
>
>        Most admins turn SLP and other broadcast services on printers off.
> For large networks, one usually sees print services enabled via AD or master
> print servers broadcasting the information on the local subnet.  This allows
> a central point of control rather than randomness.   Snow Leopard (I don't
> think Leopard did this) actually tells you where the printer is coming from
> now, so that's handy to see if they are ZC or AD or whatever.
>
> > itune sharing from someone
> > else works fine.
>
>        iTunes specifically limits its reach so that it can't extend beyond
> the local subnet and definitely does unicast in addition to ZC, so that
> doesn't really say much of anything, other than potentially invalidating
> your results.
>
> >  For some reason, things tend to work better on my
> > side of universe. :)
>
>        I'm sure it does, but not for the reasons you think they do.
>
> > Allen, if you want to get stuck on stone age
> > tools, I won't stop you.
> >
>
>        Multicast has a time and place (mainly for small, non-busy
> networks).  Using it without understanding the network impact is never a
> good idea.
>
>        FWIW, I've seen multicast traffic bring down an entire campus of
> tens of thousands of machines due to routers and switches having bugs where
> they didn't subtract from the packet's TTL.  I'm not the only one with these
> types of experiences.  Anything multicast is going to have a very large
> uphill battle for adoption because of these widespread problems.  Many
> network vendors really don't get this one right, for some reason.

Re: Hadoop Master and Slave Discovery

Posted by Allen Wittenauer <aw...@apache.org>.

On Jul 6, 2011, at 5:05 PM, Eric Yang wrote:

> Did you know that almost all linux desktop system comes with avahi
> pre-installed and turn on by default?

	... which is why most admins turn those services off by default. :)

>  What is more interesting is
> that there are thousands of those machines broadcast in large
> cooperation without anyone noticing them? 

	That's because many network teams turn off multicast past the subnet boundary and many corporate desktops are in class C subnets.  This automatically limits the host count down to 200-ish per network.  Usually just the unicast traffic is bad enough.  Throwing multicast into the mix just makes it worse.

> I have recently built a
> multicast dns browser and look into the number of machines running in
> a large company environment.  The number of desktop, laptop and
> printer machines running multicast dns is far exceeding 1000 machines
> in the local subnet.

	From my understanding of Y!'s network, the few /22's they have (which would get you 1022 potential hosts on a subnet) have multicast traffic dropped at the router and switch levels.  Additionally, DNS-SD (the service discovery portion of mDNS) offers unicast support as well.  So there is a very good chance that the traffic you are seeing is from unicast, not multicast.

	The 1000 number, BTW, comes from Apple.  I'm sure they'd be interested in your findings given their role in ZC.  

	BTW, I'd much rather hear that you set up a /22 with many many machines running VMs trying to actually use mDNS for something useful.  A service browser really isn't that interesting.

> They are all happily working fine without causing any issues.

	... that you know of.  Again, I'm 99% certain that Y! is dropping multicast packets into the bit bucket at the switch boundaries.  [I remember having this conversation with them when we setup the new data centers.]

>  Printer works fine,

	Most admins turn SLP and other broadcast services on printers off.   For large networks, one usually sees print services enabled via AD or master print servers broadcasting the information on the local subnet.  This allows a central point of control rather than randomness.   Snow Leopard (I don't think Leopard did this) actually tells you where the printer is coming from now, so that's handy to see if they are ZC or AD or whatever.

> itune sharing from someone
> else works fine.

	iTunes specifically limits its reach so that it can't extend beyond the local subnet and definitely does unicast in addition to ZC, so that doesn't really say much of anything, other than potentially invalidating your results.

>  For some reason, things tend to work better on my
> side of universe. :)  

	I'm sure it does, but not for the reasons you think they do.

> Allen, if you want to get stuck on stone age
> tools, I won't stop you.
> 
	
	Multicast has a time and place (mainly for small, non-busy networks).  Using it without understanding the network impact is never a good idea.

	FWIW, I've seen multicast traffic bring down an entire campus of tens of thousands of machines due to routers and switches having bugs where they didn't subtract from the packet's TTL.  I'm not the only one with these types of experiences.  Anything multicast is going to have a very large uphill battle for adoption because of these widespread problems.  Many network vendors really don't get this one right, for some reason.

Re: Hadoop Master and Slave Discovery

Posted by Eric Yang <er...@gmail.com>.

Did you know that almost all linux desktop system comes with avahi
pre-installed and turn on by default?  What is more interesting is
that there are thousands of those machines broadcast in large
cooperation without anyone noticing them?  I have recently built a
multicast dns browser and look into the number of machines running in
a large company environment.  The number of desktop, laptop and
printer machines running multicast dns is far exceeding 1000 machines
in the local subnet.  They are all happily working fine without
causing any issues.  Printer works fine, itune sharing from someone
else works fine.  For some reason, things tend to work better on my
side of universe. :)  Allen, if you want to get stuck on stone age
tools, I won't stop you.

regards,
Eric

On Wed, Jul 6, 2011 at 4:02 PM, Allen Wittenauer <aw...@apache.org> wrote:
>
> On Jul 5, 2011, at 2:40 AM, Steve Loughran wrote:
>> 1. you could use DNS proper, by way of Bonjour/avahi. You don't need to be running any mDNS server to support .local, and I would strongly advise against it in a large cluster (because .local resolution puts a lot of CPU load on every server in the subnet).
>
>        +1 mDNS doesn't scale to large sizes.  The only number I've ever heard is up to 1000 hosts (not services!) before the whole system falls apart. I don't think it was meant to scale past like a class C subnet.
>
>        Something else to keep in mind:  a lot of network gear gets multicast really really wrong.  There are reasons why network admins are very happy that Hadoop doesn't use multicast.
>
>        ... and all that's before one talks about the security implications.

Re: Hadoop Master and Slave Discovery

Posted by Allen Wittenauer <aw...@apache.org>.

On Jul 5, 2011, at 2:40 AM, Steve Loughran wrote:
> 1. you could use DNS proper, by way of Bonjour/avahi. You don't need to be running any mDNS server to support .local, and I would strongly advise against it in a large cluster (because .local resolution puts a lot of CPU load on every server in the subnet).

	+1 mDNS doesn't scale to large sizes.  The only number I've ever heard is up to 1000 hosts (not services!) before the whole system falls apart. I don't think it was meant to scale past like a class C subnet.  

	Something else to keep in mind:  a lot of network gear gets multicast really really wrong.  There are reasons why network admins are very happy that Hadoop doesn't use multicast.

	... and all that's before one talks about the security implications.

Re: Hadoop Master and Slave Discovery

Posted by Eric Yang <er...@gmail.com>.

I am currently working on RPM packages for Zookeeper, Pig, Hive and
HCat.  It may take a while for me to circle back to this.  Never the
less, it is interesting work that I would like to contrib.  Thanks

regards,
Eric

On Wed, Jul 6, 2011 at 9:18 AM, Patrick Hunt <ph...@apache.org> wrote:
> Eric, I'd be happy to work with you to get it committed if you'd like
> to take a whack. Would be a great addition to contrib.
>
> Patrick
>
> On Wed, Jul 6, 2011 at 9:15 AM, Eric Yang <er...@gmail.com> wrote:
>> It would be nicer, if it was written in Java.  I think something wrap
>> on top of jmdns would be a better fit for Zookeeper.
>>
>> regards,
>> Eric
>>
>> On Wed, Jul 6, 2011 at 9:10 AM, Patrick Hunt <ph...@apache.org> wrote:
>>> There's a long standing "ZooKeeper DNS server" jira which can be found
>>> here, someone has already created a basic implementation:
>>> https://issues.apache.org/jira/browse/ZOOKEEPER-703
>>>
>>> Patrick
>>>
>>> On Wed, Jul 6, 2011 at 2:52 AM, Steve Loughran <st...@apache.org> wrote:
>>>> On 05/07/11 23:00, Eric Yang wrote:
>>>>>
>>>>> In another project, I have implemented a bonjour beacon (jmdns) which
>>>>> sit on the Zookeeper nodes to advertise the location of zookeeper
>>>>> servers.  When clients start up, it will discover location of
>>>>> zookeeper through multicast dns.  Once, the server locations are
>>>>> resolved (ip:port and TXT records), the clients shutdown mdns
>>>>> resolvers.  Client proceed to use the resolved list for zookeeper
>>>>> access.  There does not seem to be cpu overhead incurred by the
>>>>> beacon, nor the clients.  If a client could not connect to zookeeper
>>>>> anymore, then it will start mdns resolvers to look for new list of
>>>>> zookeeper servers.  The code for the project is located at:
>>>>>
>>>>> http://github.com/macroadster/hms
>>>>>
>>>>> It may be possible to use similar approach for location resolution,
>>>>> and load rest of the config through zookeeper.
>>>>>
>>>>> regards,
>>>>> Eric
>>>>>
>>>>
>>>> That's interesting; I think it's more important for clients to be able to
>>>> bind dynamically than it is for the cluster machines, as they should be
>>>> managed anyway.
>>>>
>>>> When I was doing the hadoop-in-VM clustering stuff, I had a well-known URL
>>>> to serve up the relevant XML file for the cluster from the JT -all it did
>>>> was relay the request to the JT at whatever host it had been assigned. All
>>>> the clients needed to know was the URL of the config server, and they could
>>>> bootstrap to working against clusters whose FS and JT URLs were different
>>>> from run to run.
>>>>
>>>> zookeeper discovery would benefit a lot of projects
>>>>
>>>>
>>>
>>
>

Re: Hadoop Master and Slave Discovery

Posted by Patrick Hunt <ph...@apache.org>.

Eric, I'd be happy to work with you to get it committed if you'd like
to take a whack. Would be a great addition to contrib.

Patrick

On Wed, Jul 6, 2011 at 9:15 AM, Eric Yang <er...@gmail.com> wrote:
> It would be nicer, if it was written in Java.  I think something wrap
> on top of jmdns would be a better fit for Zookeeper.
>
> regards,
> Eric
>
> On Wed, Jul 6, 2011 at 9:10 AM, Patrick Hunt <ph...@apache.org> wrote:
>> There's a long standing "ZooKeeper DNS server" jira which can be found
>> here, someone has already created a basic implementation:
>> https://issues.apache.org/jira/browse/ZOOKEEPER-703
>>
>> Patrick
>>
>> On Wed, Jul 6, 2011 at 2:52 AM, Steve Loughran <st...@apache.org> wrote:
>>> On 05/07/11 23:00, Eric Yang wrote:
>>>>
>>>> In another project, I have implemented a bonjour beacon (jmdns) which
>>>> sit on the Zookeeper nodes to advertise the location of zookeeper
>>>> servers.  When clients start up, it will discover location of
>>>> zookeeper through multicast dns.  Once, the server locations are
>>>> resolved (ip:port and TXT records), the clients shutdown mdns
>>>> resolvers.  Client proceed to use the resolved list for zookeeper
>>>> access.  There does not seem to be cpu overhead incurred by the
>>>> beacon, nor the clients.  If a client could not connect to zookeeper
>>>> anymore, then it will start mdns resolvers to look for new list of
>>>> zookeeper servers.  The code for the project is located at:
>>>>
>>>> http://github.com/macroadster/hms
>>>>
>>>> It may be possible to use similar approach for location resolution,
>>>> and load rest of the config through zookeeper.
>>>>
>>>> regards,
>>>> Eric
>>>>
>>>
>>> That's interesting; I think it's more important for clients to be able to
>>> bind dynamically than it is for the cluster machines, as they should be
>>> managed anyway.
>>>
>>> When I was doing the hadoop-in-VM clustering stuff, I had a well-known URL
>>> to serve up the relevant XML file for the cluster from the JT -all it did
>>> was relay the request to the JT at whatever host it had been assigned. All
>>> the clients needed to know was the URL of the config server, and they could
>>> bootstrap to working against clusters whose FS and JT URLs were different
>>> from run to run.
>>>
>>> zookeeper discovery would benefit a lot of projects
>>>
>>>
>>
>

Re: Hadoop Master and Slave Discovery

Posted by Eric Yang <er...@gmail.com>.

It would be nicer, if it was written in Java.  I think something wrap
on top of jmdns would be a better fit for Zookeeper.

regards,
Eric

On Wed, Jul 6, 2011 at 9:10 AM, Patrick Hunt <ph...@apache.org> wrote:
> There's a long standing "ZooKeeper DNS server" jira which can be found
> here, someone has already created a basic implementation:
> https://issues.apache.org/jira/browse/ZOOKEEPER-703
>
> Patrick
>
> On Wed, Jul 6, 2011 at 2:52 AM, Steve Loughran <st...@apache.org> wrote:
>> On 05/07/11 23:00, Eric Yang wrote:
>>>
>>> In another project, I have implemented a bonjour beacon (jmdns) which
>>> sit on the Zookeeper nodes to advertise the location of zookeeper
>>> servers.  When clients start up, it will discover location of
>>> zookeeper through multicast dns.  Once, the server locations are
>>> resolved (ip:port and TXT records), the clients shutdown mdns
>>> resolvers.  Client proceed to use the resolved list for zookeeper
>>> access.  There does not seem to be cpu overhead incurred by the
>>> beacon, nor the clients.  If a client could not connect to zookeeper
>>> anymore, then it will start mdns resolvers to look for new list of
>>> zookeeper servers.  The code for the project is located at:
>>>
>>> http://github.com/macroadster/hms
>>>
>>> It may be possible to use similar approach for location resolution,
>>> and load rest of the config through zookeeper.
>>>
>>> regards,
>>> Eric
>>>
>>
>> That's interesting; I think it's more important for clients to be able to
>> bind dynamically than it is for the cluster machines, as they should be
>> managed anyway.
>>
>> When I was doing the hadoop-in-VM clustering stuff, I had a well-known URL
>> to serve up the relevant XML file for the cluster from the JT -all it did
>> was relay the request to the JT at whatever host it had been assigned. All
>> the clients needed to know was the URL of the config server, and they could
>> bootstrap to working against clusters whose FS and JT URLs were different
>> from run to run.
>>
>> zookeeper discovery would benefit a lot of projects
>>
>>
>

Re: Hadoop Master and Slave Discovery

Posted by Patrick Hunt <ph...@apache.org>.

There's a long standing "ZooKeeper DNS server" jira which can be found
here, someone has already created a basic implementation:
https://issues.apache.org/jira/browse/ZOOKEEPER-703

Patrick

On Wed, Jul 6, 2011 at 2:52 AM, Steve Loughran <st...@apache.org> wrote:
> On 05/07/11 23:00, Eric Yang wrote:
>>
>> In another project, I have implemented a bonjour beacon (jmdns) which
>> sit on the Zookeeper nodes to advertise the location of zookeeper
>> servers.  When clients start up, it will discover location of
>> zookeeper through multicast dns.  Once, the server locations are
>> resolved (ip:port and TXT records), the clients shutdown mdns
>> resolvers.  Client proceed to use the resolved list for zookeeper
>> access.  There does not seem to be cpu overhead incurred by the
>> beacon, nor the clients.  If a client could not connect to zookeeper
>> anymore, then it will start mdns resolvers to look for new list of
>> zookeeper servers.  The code for the project is located at:
>>
>> http://github.com/macroadster/hms
>>
>> It may be possible to use similar approach for location resolution,
>> and load rest of the config through zookeeper.
>>
>> regards,
>> Eric
>>
>
> That's interesting; I think it's more important for clients to be able to
> bind dynamically than it is for the cluster machines, as they should be
> managed anyway.
>
> When I was doing the hadoop-in-VM clustering stuff, I had a well-known URL
> to serve up the relevant XML file for the cluster from the JT -all it did
> was relay the request to the JT at whatever host it had been assigned. All
> the clients needed to know was the URL of the config server, and they could
> bootstrap to working against clusters whose FS and JT URLs were different
> from run to run.
>
> zookeeper discovery would benefit a lot of projects
>
>

Re: Hadoop Master and Slave Discovery

Posted by Steve Loughran <st...@apache.org>.

On 05/07/11 23:00, Eric Yang wrote:
> In another project, I have implemented a bonjour beacon (jmdns) which
> sit on the Zookeeper nodes to advertise the location of zookeeper
> servers.  When clients start up, it will discover location of
> zookeeper through multicast dns.  Once, the server locations are
> resolved (ip:port and TXT records), the clients shutdown mdns
> resolvers.  Client proceed to use the resolved list for zookeeper
> access.  There does not seem to be cpu overhead incurred by the
> beacon, nor the clients.  If a client could not connect to zookeeper
> anymore, then it will start mdns resolvers to look for new list of
> zookeeper servers.  The code for the project is located at:
>
> http://github.com/macroadster/hms
>
> It may be possible to use similar approach for location resolution,
> and load rest of the config through zookeeper.
>
> regards,
> Eric
>

That's interesting; I think it's more important for clients to be able 
to bind dynamically than it is for the cluster machines, as they should 
be managed anyway.

When I was doing the hadoop-in-VM clustering stuff, I had a well-known 
URL to serve up the relevant XML file for the cluster from the JT -all 
it did was relay the request to the JT at whatever host it had been 
assigned. All the clients needed to know was the URL of the config 
server, and they could bootstrap to working against clusters whose FS 
and JT URLs were different from run to run.

zookeeper discovery would benefit a lot of projects

Re: Hadoop Master and Slave Discovery

Posted by Eric Yang <er...@gmail.com>.

In another project, I have implemented a bonjour beacon (jmdns) which
sit on the Zookeeper nodes to advertise the location of zookeeper
servers.  When clients start up, it will discover location of
zookeeper through multicast dns.  Once, the server locations are
resolved (ip:port and TXT records), the clients shutdown mdns
resolvers.  Client proceed to use the resolved list for zookeeper
access.  There does not seem to be cpu overhead incurred by the
beacon, nor the clients.  If a client could not connect to zookeeper
anymore, then it will start mdns resolvers to look for new list of
zookeeper servers.  The code for the project is located at:

http://github.com/macroadster/hms

It may be possible to use similar approach for location resolution,
and load rest of the config through zookeeper.

regards,
Eric

On Tue, Jul 5, 2011 at 2:40 AM, Steve Loughran <st...@apache.org> wrote:
> On 04/07/11 18:22, Ted Dunning wrote:
>>
>> One reasonable suggestion that I have heard recently was to do like Google
>> does and put a DNS front end onto Zookeeper.  Machines would need to have
>> DNS set up properly and a requests for a special ZK based domain would
>> have
>> to be delegated to the fancy DNS setup, but this would allow all kinds of
>> host targeted configuration settings to be moved by a very standardized
>> network protocol.  There are difficulties moving port numbers and all you
>> really get are hostnames, but it is a nice trick since configuration of
>> which nameserver to use is a common admin task.
>
> good point
>
> 1. you could use DNS proper, by way of Bonjour/avahi. You don't need to be
> running any mDNS server to support .local, and I would strongly advise
> against it in a large cluster (because .local resolution puts a lot of CPU
> load on every server in the subnet). What you can do is have the DNS server
> register some .local entries and have the clients use this to bind. You
> probably also need to set the dns TTLs in the JVM. In a large clusters
> that'll just add to the DNS traffic, so it's where host tables start to look
> appealing
>
> 2. Apache MINA is set up to serve its directory data in lots of ways,
> including what appears to be text files over NFS. This is an even nicer
> trick. If you could get MINA to serve up the ZK data, life is very simple
>
>
>>
>> On Mon, Jul 4, 2011 at 3:44 AM, Steve Loughran<st...@apache.org>  wrote:
>>
>>> On 03/07/11 03:11, Raja Nagendra Kumar wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> Instead of depending on local syncup to configuration files, would it be
>>>> a
>>>> nice way to adopt JINI Discovery model, where in masters and slaves can
>>>> discover each other dynamically through a UDP broadcast/heart beat
>>>> methods
>>>>
>>>
>>> That assumes that UDP Broadcast is supported through the switches (many
>>> turn it off as it creates too much traffic), or UDP multicast is
>>> supported
>>> (as an example of an infrastructure that does not, play with EC2)
>>>
>>>
>>>  This would mean, any machine can come up and say I am a slave and
>>>>
>>>> automatically discover the master and start supporting the master with
>>>> in<
>>>> x seconds.
>>>>
>>>>
>>>
>>> How will your slave determine that the master that it has bonded to is
>>> the
>>> master that it should bond to and not something malicious within the same
>>> multicast range? It's possible, but you have generally have to have
>>> configuration files in the worker nodes.
>>>
>>> There's nothing to stop your Configuration Management layer using
>>> discovery
>>> or central config servers (Zookeeper, Anubis, LDAP, DNS, ...), which then
>>> pushes desired state information to the client nodes. These can deal with
>>> auth and may support infrastructures that don't support broadcast or
>>> multicast. Such tooling also gives you the ability to push out host table
>>> config, JVM options, logging parameters, and bounce worker nodes into new
>>> states without waiting for them to timeout and try to rediscover new
>>> masters.
>>>
>>
>
>

Re: Hadoop Master and Slave Discovery

Posted by Steve Loughran <st...@apache.org>.

On 04/07/11 18:22, Ted Dunning wrote:
> One reasonable suggestion that I have heard recently was to do like Google
> does and put a DNS front end onto Zookeeper.  Machines would need to have
> DNS set up properly and a requests for a special ZK based domain would have
> to be delegated to the fancy DNS setup, but this would allow all kinds of
> host targeted configuration settings to be moved by a very standardized
> network protocol.  There are difficulties moving port numbers and all you
> really get are hostnames, but it is a nice trick since configuration of
> which nameserver to use is a common admin task.

good point

1. you could use DNS proper, by way of Bonjour/avahi. You don't need to 
be running any mDNS server to support .local, and I would strongly 
advise against it in a large cluster (because .local resolution puts a 
lot of CPU load on every server in the subnet). What you can do is have 
the DNS server register some .local entries and have the clients use 
this to bind. You probably also need to set the dns TTLs in the JVM. In 
a large clusters that'll just add to the DNS traffic, so it's where host 
tables start to look appealing

2. Apache MINA is set up to serve its directory data in lots of ways, 
including what appears to be text files over NFS. This is an even nicer 
trick. If you could get MINA to serve up the ZK data, life is very simple


>
> On Mon, Jul 4, 2011 at 3:44 AM, Steve Loughran<st...@apache.org>  wrote:
>
>> On 03/07/11 03:11, Raja Nagendra Kumar wrote:
>>
>>>
>>> Hi,
>>>
>>> Instead of depending on local syncup to configuration files, would it be a
>>> nice way to adopt JINI Discovery model, where in masters and slaves can
>>> discover each other dynamically through a UDP broadcast/heart beat methods
>>>
>>
>> That assumes that UDP Broadcast is supported through the switches (many
>> turn it off as it creates too much traffic), or UDP multicast is supported
>> (as an example of an infrastructure that does not, play with EC2)
>>
>>
>>   This would mean, any machine can come up and say I am a slave and
>>> automatically discover the master and start supporting the master with in<
>>> x seconds.
>>>
>>>
>>
>> How will your slave determine that the master that it has bonded to is the
>> master that it should bond to and not something malicious within the same
>> multicast range? It's possible, but you have generally have to have
>> configuration files in the worker nodes.
>>
>> There's nothing to stop your Configuration Management layer using discovery
>> or central config servers (Zookeeper, Anubis, LDAP, DNS, ...), which then
>> pushes desired state information to the client nodes. These can deal with
>> auth and may support infrastructures that don't support broadcast or
>> multicast. Such tooling also gives you the ability to push out host table
>> config, JVM options, logging parameters, and bounce worker nodes into new
>> states without waiting for them to timeout and try to rediscover new
>> masters.
>>
>

Re: Hadoop Master and Slave Discovery

Posted by Ted Dunning <td...@maprtech.com>.

One reasonable suggestion that I have heard recently was to do like Google
does and put a DNS front end onto Zookeeper.  Machines would need to have
DNS set up properly and a requests for a special ZK based domain would have
to be delegated to the fancy DNS setup, but this would allow all kinds of
host targeted configuration settings to be moved by a very standardized
network protocol.  There are difficulties moving port numbers and all you
really get are hostnames, but it is a nice trick since configuration of
which nameserver to use is a common admin task.

On Mon, Jul 4, 2011 at 3:44 AM, Steve Loughran <st...@apache.org> wrote:

> On 03/07/11 03:11, Raja Nagendra Kumar wrote:
>
>>
>> Hi,
>>
>> Instead of depending on local syncup to configuration files, would it be a
>> nice way to adopt JINI Discovery model, where in masters and slaves can
>> discover each other dynamically through a UDP broadcast/heart beat methods
>>
>
> That assumes that UDP Broadcast is supported through the switches (many
> turn it off as it creates too much traffic), or UDP multicast is supported
> (as an example of an infrastructure that does not, play with EC2)
>
>
>  This would mean, any machine can come up and say I am a slave and
>> automatically discover the master and start supporting the master with in<
>> x seconds.
>>
>>
>
> How will your slave determine that the master that it has bonded to is the
> master that it should bond to and not something malicious within the same
> multicast range? It's possible, but you have generally have to have
> configuration files in the worker nodes.
>
> There's nothing to stop your Configuration Management layer using discovery
> or central config servers (Zookeeper, Anubis, LDAP, DNS, ...), which then
> pushes desired state information to the client nodes. These can deal with
> auth and may support infrastructures that don't support broadcast or
> multicast. Such tooling also gives you the ability to push out host table
> config, JVM options, logging parameters, and bounce worker nodes into new
> states without waiting for them to timeout and try to rediscover new
> masters.
>

Re: Hadoop Master and Slave Discovery

Posted by Steve Loughran <st...@apache.org>.

On 03/07/11 03:11, Raja Nagendra Kumar wrote:
>
> Hi,
>
> Instead of depending on local syncup to configuration files, would it be a
> nice way to adopt JINI Discovery model, where in masters and slaves can
> discover each other dynamically through a UDP broadcast/heart beat methods

That assumes that UDP Broadcast is supported through the switches (many 
turn it off as it creates too much traffic), or UDP multicast is 
supported (as an example of an infrastructure that does not, play with EC2)

> This would mean, any machine can come up and say I am a slave and
> automatically discover the master and start supporting the master with in<
> x seconds.
>

How will your slave determine that the master that it has bonded to is 
the master that it should bond to and not something malicious within the 
same multicast range? It's possible, but you have generally have to have 
configuration files in the worker nodes.

There's nothing to stop your Configuration Management layer using 
discovery or central config servers (Zookeeper, Anubis, LDAP, DNS, ...), 
which then pushes desired state information to the client nodes. These 
can deal with auth and may support infrastructures that don't support 
broadcast or multicast. Such tooling also gives you the ability to push 
out host table config, JVM options, logging parameters, and bounce 
worker nodes into new states without waiting for them to timeout and try 
to rediscover new masters.