You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@slider.apache.org by st...@apache.org on 2014/06/18 00:31:38 UTC
[3/5] git commit: SLIDER-3 : service registry

SLIDER-3 : service registry


Project: http://git-wip-us.apache.org/repos/asf/incubator-slider/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-slider/commit/9b7000fa
Tree: http://git-wip-us.apache.org/repos/asf/incubator-slider/tree/9b7000fa
Diff: http://git-wip-us.apache.org/repos/asf/incubator-slider/diff/9b7000fa

Branch: refs/heads/develop
Commit: 9b7000fab354460a36c2af751e7697c643dbe414
Parents: 93c1b3b
Author: Steve Loughran <st...@apache.org>
Authored: Mon Jun 16 16:58:02 2014 -0700
Committer: Steve Loughran <st...@apache.org>
Committed: Mon Jun 16 16:58:02 2014 -0700

----------------------------------------------------------------------
 .../registry/a_YARN_service_registry.md         |   1 -
 .../markdown/registry/p2p_service_registries.md |  78 ++++++++++----
 ...lication_registration_and_binding_problem.md | 107 +++++++++++++------
 3 files changed, 135 insertions(+), 51 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-slider/blob/9b7000fa/src/site/markdown/registry/a_YARN_service_registry.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/registry/a_YARN_service_registry.md b/src/site/markdown/registry/a_YARN_service_registry.md
index 23cfce9..b695106 100644
--- a/src/site/markdown/registry/a_YARN_service_registry.md
+++ b/src/site/markdown/registry/a_YARN_service_registry.md
@@ -224,4 +224,3 @@ This isn't a registry service directly, though LDAP queries do make enumeration
 
 If service information were to be published via LDAP, then it should allow IT-managed LDAP services to both host this information, and publish configuration data. This would be relevant for classic Hadoop applications if we were to move the Configuration class to support back-end configuration sources beyond XML files on the classpath.
 
-# Proposal

http://git-wip-us.apache.org/repos/asf/incubator-slider/blob/9b7000fa/src/site/markdown/registry/p2p_service_registries.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/registry/p2p_service_registries.md b/src/site/markdown/registry/p2p_service_registries.md
index 2cf953c..eaf5097 100644
--- a/src/site/markdown/registry/p2p_service_registries.md
+++ b/src/site/markdown/registry/p2p_service_registries.md
@@ -17,9 +17,11 @@
   
 # P2P Service Registries for Apache Slider
 
-Alongside the centralized service registries, there's much prior work on P2P discovery systems, especially for mobile and consumer devices.
+Alongside the centralized service registries, there's much prior work on
+P2P discovery systems, especially for mobile and consumer devices.
 
-They perform some multicast- or distributed hash table-based lookup, and tend to have common limitations:
+They perform some multicast- or distributed hash table-based lookup,
+and tend to have common limitations:
 
 * scalability
 
@@ -29,15 +31,29 @@ They perform some multicast- or distributed hash table-based lookup, and tend to
 
 * consistency: can you trust the results to be complete and current?
 
-Bootstrapping is usually done via multicast, possibly then switching to unicast for better scale. As multicasting doesn't work in cloud infrastructures, none of the services work unmodified  in public clouds. There's multiple anecdotes of [Amazon's SimpleDB service](http://aws.amazon.com/simpledb/) being used as a registry for in-EC2 applications. At the very least, this service and its equivalents in other cloud providers could be used to bootstrap ZK client bindings in cloud environments. 
+Bootstrapping is usually done via multicast, possibly then switching
+to unicast for better scale. As multicasting doesn't work in cloud
+infrastructures, none of the services work unmodified  in public
+clouds. There's multiple anecdotes of
+[Amazon's SimpleDB service](http://aws.amazon.com/simpledb/) being used as a
+registry for in-EC2 applications. At the very least, this service and its
+equivalents in other cloud providers could be used to bootstrap ZK client
+bindings in cloud environments. 
 
 ## Service Location Protocol 
 
-Service Location Protocol is a protocol for discovery services that came out of Sun, Novell and others -it is still available for printer discovery and suchlike
+Service Location Protocol is a protocol for discovery services that came out
+of Sun, Novell and others -it is still available for printer discovery and
+suchlike
 
-It supports both a multicast discovery mechanism, and a unicast protocol to talk to a Directory Agent -an agent that is itself discovered by multicast requests, or by listening for the agent's intermittent multicast announcements.
+It supports both a multicast discovery mechanism, and a unicast protocol
+to talk to a Directory Agent -an agent that is itself discovered by multicast
+requests, or by listening for the agent's intermittent multicast announcements.
 
-There's an extension to DHCP, RFC2610, which added the ability for DHCP to advertise Directory Agents -this was designed to solve the bootstrap problem (though not necessarily security or in-cloud deployment). Apart from a few mentions in Windows Server technical notes, it does not appear to exist.
+There's an extension to DHCP, RFC2610, which added the ability for DHCP to
+advertise Directory Agents -this was designed to solve the bootstrap problem
+(though not necessarily security or in-cloud deployment). Apart from a few
+mentions in Windows Server technical notes, it does not appear to exist.
 
 * [[RFC2608](http://www.ietf.org/rfc/rfc2608.txt)] *Service Location Protocol, Version 2* , IEEE, 1999
 
@@ -47,17 +63,23 @@ There's an extension to DHCP, RFC2610, which added the ability for DHCP to adver
 
 ## [Zeroconf](http://www.zeroconf.org/)
 
-The multicast discovery service implemented in Apple's Bonjour system -multicasting DNS lookups to all peers in the subnet.
+The multicast discovery service implemented in Apple's Bonjour system
+--multicasting DNS lookups to all peers in the subnet.
 
-This allows for URLs and hostnames to be dynamically positioned, with DNS domain searches allowing for enumeration of service groups. 
+This allows for URLs and hostnames to be dynamically positioned, with
+DNS domain searches allowing for enumeration of service groups. 
 
-This protocol scales very badly; the load on *every* client in the subnet is is O(DNS-queries-across-subnet), hence implicitly `O(devices)*O(device-activity)`. 
+This protocol scales very badly; the load on *every* client in the
+subnet is is O(DNS-queries-across-subnet), hence implicitly `O(devices)*O(device-activity)`. 
 
-The special domains "_tcp", "_udp"  and below can also be served up via a normal DNS server.
+The special domains `_tcp.`, `_udp.`  and their subdomains can also be
+served up via a normal DNS server.
 
 ##  [Jini/Apache River](http://river.apache.org/doc/specs/html/lookup-spec.html)
 
-Attribute-driven service enumeration, which drives the, Java-client-only model of downloading client-side code. There's no requirement for the remote services to be in Java, only that drivers are.
+Attribute-driven service enumeration, which drives the, Java-client-only
+model of downloading client-side code. There's no requirement for the remote
+services to be in Java, only that drivers are.
 
 ## [Serf](http://www.serfdom.io/)  
 
@@ -71,13 +93,17 @@ Strengths:
 
 * The shared knowledge mechanism permits reasoning and mathematical proofs.
 
-* Strict ordering between heartbeats implies an ordering in receipt. This is stronger than ZK's guarantees.
+* Strict ordering between heartbeats implies an ordering in receipt.
+This is stronger than ZK's guarantees.
 
-* Lets you share a moderate amount of data (the longer the heartbeat interval, the more data you can publish).
+* Lets you share a moderate amount of data (the longer the heartbeat
+interval, the more data you can publish).
 
-* Provided the JVM hosting the Anubis agent is also hosting the service, liveness is implicit
+* Provided the JVM hosting the Anubis agent is also hosting the service,
+liveness is implicit
 
-* Secure to the extent that it can be locked down to allow only nodes with mutual trust of HTTPS certificates to join the tuple-space.
+* Secure to the extent that it can be locked down to allow only nodes with
+mutual trust of HTTPS certificates to join the tuple-space.
 
 Weaknesses
 
@@ -85,15 +111,27 @@ Weaknesses
 
 * Brittle to timing, especially on virtualized clusters where clocks are unpredictable.
 
-It proved good for workload sharing -tasks can be published to it, any agent can say "I'm working on it" and take up the work. If the process fails, the task becomes available again. We used this for distributed scheduling in a rendering farm.
+It proved good for workload sharing -tasks can be published to it, any
+agent can say "I'm working on it" and take up the work. If the process
+fails, the task becomes available again. We used this for distributed scheduling in a rendering farm.
 
 ## [Carmen](http://www.hpl.hp.com/techreports/2002/HPL-2002-257)
 
-This was another HP Labs project, related to the Cooltown "ubiquitous computing" work, which was a decade too early to be relevant. It was also positioned by management as a B2B platform, so ended up competing with - and losing against - WS-* and UDDI.. 
+This was another HP Labs project, related to the Cooltown "ubiquitous
+computing" work, which was a decade too early to be relevant. It was
+also positioned by management as a B2B platform, so ended up competing
+with - and losing against - WS-* and UDDI. 
 
-Carmen aimed to provide service discovery with both fixed services, and with highly mobile client services that will roam around the network -they are assumed to be wireless devices.
+Carmen aimed to provide service discovery with both fixed services, and
+with highly mobile client services that will roam around the network -they
+are assumed to be wireless devices.
 
-Services were published with and searched for by attributed, locality was considered to be a key attribute -local instances of a service prioritized. Those services with a static location and low rate of change became the stable caches of service information -becoming, as with skype, "supernodes". 
+Services were published with and searched for by attributed, locality
+was considered to be a key attribute -local instances of a service
+prioritized. Those services with a static location and low rate of
+change became the stable caches of service information -becoming,
+as with skype, "supernodes". 
 
-Bootstrapping the cluster relied on multicast, though alternatives based on DHCP and DNS were proposed.
+Bootstrapping the cluster relied on multicast, though alternatives
+based on DHCP and DNS were proposed.
 

http://git-wip-us.apache.org/repos/asf/incubator-slider/blob/9b7000fa/src/site/markdown/registry/the_YARN_application_registration_and_binding_problem.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/registry/the_YARN_application_registration_and_binding_problem.md b/src/site/markdown/registry/the_YARN_application_registration_and_binding_problem.md
index 0efa420..ab2493d 100644
--- a/src/site/markdown/registry/the_YARN_application_registration_and_binding_problem.md
+++ b/src/site/markdown/registry/the_YARN_application_registration_and_binding_problem.md
@@ -28,59 +28,85 @@ There are some constraints here
 
 1. The location of an application deployed in a YARN cluster cannot be predicted.
 
-2. The ports used for application service endpoints cannot be hard-coded or predicted. (Alternatively: if they are hard-coded, then Socket-In-Use exceptions may occur)
-
-3: As components fail and get re-instantiated, their location may change. The rate of this depends on cluster and application stability; the longer lived the application, the more common it is.
-
-Existing Hadoop client apps have a configuration problem of their own: how are the settings in files such as `yarn-site.xml`picked up by today's applications? This is an issue which has historically been out of scope for Hadoop clusters -but if we are looking at registration and binding of YARN applications, there should be no reason why
+1. The ports used for application service endpoints cannot be hard-coded
+or predicted. (Alternatively: if they are hard-coded, then Socket-In-Use exceptions may occur)
+
+1: As components fail and get re-instantiated, their location may change. 
+The rate of this depends on cluster and application stability; the longer
+ lived the application, the more common it is.
+
+Existing Hadoop client apps have a configuration problem of their own:
+how are the settings in files such as `yarn-site.xml`picked up by today's
+applications? This is an issue which has historically been out of scope
+for Hadoop clusters -but if we are looking at registration and binding
+of YARN applications, there should be no reason why
 static applications cannot be discovered and bonded to using the same mechanisms. 
 
 # Other constraints:
 
-1. Reduce the amount of change needed in existing applications to a minimum -ideally none, though some pre-launch setup may be acceptable.
+1. Reduce the amount of change needed in existing applications to a minimum 
+---ideally none, though some pre-launch setup may be acceptable.
 
 2. Prevent malicious applications from registering a service endpoints.
 
 3. Scale with #of applications and #of clients; not overload on a cluster partitioning.
 
-4. Offer a design that works with apps that are deployed in a YARN custer outside of Slider. Rationale: want a mechanism that works with pure-YARN apps
+4. Offer a design that works with apps that are deployed in a YARN custer 
+outside of Slider. Rationale: want a mechanism that works with pure-YARN apps
 
 ## Possible Solutions:
 
 ### ZK
 
-Client applications use ZK to find services (addresses #1, #2 and #3). Requires location code in the client.
+Client applications use ZK to find services (addresses #1, #2 and #3).
+Requires location code in the client.
 
 HBase and Accumulo do this as part of a failover-ready design.
 
 ### DNS
 
-Client apps use DNS to find services, with custom DNS server for a subdomain representing YARN services. Addresses #1; with a shortened TTL and no DNS address caching, #3. #2 addressed only if other DNS entries are used to publish service entries. 
+Client apps use DNS to find services, with custom DNS server for a 
+subdomain representing YARN services. Addresses #1; with a shortened TTL and 
+no DNS address caching, #3. #2 addressed only if other DNS entries are used to
+ publish service entries. 
 
-Should support existing applications, with a configuration that is stable over time. It does require the clients to not cache DNS addresses forever (this must be explicitly set on Java applications,
-irrespective of the published TTL). It generates a load on the DNS servers that is `O(clients/TTL)`
+Should support existing applications, with a configuration that is stable
+over time. It does require the clients to not cache DNS addresses forever
+(this must be explicitly set on Java applications,
+irrespective of the published TTL). It generates a load on the DNS servers
+that is `O(clients/TTL)`
 
 Google Chubby offers a DNS service to handle this. ZK does not -yet.
 
 ### Floating IP Addresses
 
-If the clients know/cache IP addresses of services, these addresses could be floated across service instances. Linux HA has floating IP address support, while Docker containers can make use of them, especially if an integrated DHCP server handles the assignment of IP addresses to specific containers. 
+If the clients know/cache IP addresses of services, these addresses could be
+floated across service instances. Linux HA has floating IP address support,
+while Docker containers can make use of them, especially if an integrated DHCP
+server handles the assignment of IP addresses to specific containers. 
 
-ARP caching is the inevitable problem here, but it is still less brittle than relying on applications to know not to cache IP addresses -and nor does it place so much on DNS servers as short-TTL DNS entries.
+ARP caching is the inevitable problem here, but it is still less brittle than
+relying on applications to know not to cache IP addresses -and nor does it
+place so much on DNS servers as short-TTL DNS entries.
 
 ### LDAP
 
-Enterprise Directory services are used to publish/locate services. Requires lookup into the directory on binding (#1, #2), re-lookup on failure (#3). LDAP permissions can prevent untrusted applications registering.
+Enterprise Directory services are used to publish/locate services. Requires
+lookup into the directory on binding (#1, #2), re-lookup on failure (#3).
+LDAP permissions can prevent untrusted applications registering.
 
 * Works well with Windows registries.
 
-* Less common Java-side, though possible -and implemented in the core Java libraries. Spring-LDAP is focused on connection to an LDAP server -not LDAP-driven application config.
+* Less common Java-side, though possible -and implemented in the core Java
+libraries. Spring-LDAP is focused on connection to an LDAP server
+-not LDAP-driven application config.
 
 ### Registration Web Service
 
  Custom web service registration services used. 
 
-* The sole WS-* one, UDDI, does not have a REST equivalent -DNS is assumed to take on that role.
+* The sole WS-* one, UDDI, does not have a REST equivalent
+--DNS is assumed to take on that role.
 
 * Requires new client-side code anyway.
 
@@ -90,15 +116,24 @@ Offer our own `zk://` URL; java & .NET implementations (others?) to resolve, bro
 
 * Would address requirements #1 & #3
 
-* Cost: non-standard; needs an extension for every application/platform, and will not work with tools such as CURL or web browsers
+* Cost: non-standard; needs an extension for every application/platform, and
+will not work with tools such as CURL or web browsers
 
 ### AM-side config generation
 
-App-side config generation-YARN applications to generate client-side configuration files for launch-time information (#1, #2). The AM can dynamically create these, and as the storage load is all in the AM, does not consume as much resources in a central server as would publishing it all to that central server.
+App-side config generation-YARN applications to generate client-side
+configuration files for launch-time information (#1, #2).
+The AM can dynamically create these, and as the storage load is all in
+the AM, does not consume as much resources in a central server as would 
+publishing it all to that central server.
 
-* Requires application to know of client-side applications to support - and be able to generate to their configuration information (i.e. formatted files).
+* Requires application to know of client-side applications to support -
+and be able to generate to their configuration information (i.e. formatted files).
 
-* Requires the AM to get all information from deployed application components needed to generate bindings. Unless the AM can resolve YARN App templates, need a way to get one of the components in the app to generate settings for the entire cluster, and push them back.
+* Requires the AM to get all information from deployed application components
+needed to generate bindings. Unless the AM can resolve YARN App templates,
+need a way to get one of the components in the app to generate settings for
+the entire cluster, and push them back.
 
 * Needs to be repeated for all YARN apps, however deployed.
 
@@ -107,15 +142,21 @@ App-side config generation-YARN applications to generate client-side configurati
 
 ### Client-side config generation
 
-YARN app to publish attributes as key-val pairs, client-side code to read and generate configs from  (#1, #2).  Example configuration generators could create: Hadoop-client XML, Spring, tomcat, guice configs, something for .NET.
+YARN app to publish attributes as key-val pairs, client-side code to read and
+generate configs from  (#1, #2).  Example configuration generators could
+create: Hadoop-client XML, Spring, tomcat, guice configs, something for .NET.
 
 * Not limited to Hoya application deployments only.
 
-* K-V pairs need to be published "somewhere". A structured section in the ZK tree per app is the obvious location -though potentially expensive. An alternative is AM-published data.
+* K-V pairs need to be published "somewhere". A structured section in the
+ZK tree per app is the obvious location -though potentially expensive. An
+alternative is AM-published data.
 
-* Needs client-side code capable of extracting information from YARN cluster to generate client-specific configuration.
+* Needs client-side code capable of extracting information from YARN cluster
+to generate client-specific configuration.
 
-* Assumes (key, value) pairs sufficient for client config generation. Again, some template expansion will aid here (this time: client-side interpretation).
+* Assumes (key, value) pairs sufficient for client config generation. Again,
+some template expansion will aid here (this time: client-side interpretation).
 
 * Client config generators need to find and bind to the target application themselves.
 
@@ -123,23 +164,29 @@ YARN app to publish attributes as key-val pairs, client-side code to read and ge
 
 Multiple options:
 
-* Standard ZK structure for YARN applications (maybe: YARN itself to register paths in ZK & set up child permissions,so enforcing security).
+* Standard ZK structure for YARN applications (maybe: YARN itself to register
+paths in ZK and set up child permissions,so enforcing security).
 
 * Agents to push to ZK dynamic information as K-V pairs
 
 * Agent provider on AM to fetch K-V pairs and include in status requests
 
-* CLI to fetch app config keys, echo out responses (needs client log4j settings to log all slf/log4j to stderr, so that app > results.txt would save results explicitly
+* CLI to fetch app config keys, echo out responses (needs client log4j settings
+to log all slf/log4j to stderr, so that app > results.txt would save results explicitly
 
 *  client side code per app to generate specific binding information
 
 ### Load-balancer app Yarn App 
 
-Spread requests round a set of registered handlers, e.g web servers. Support plugins for session binding/sharding. 
+Spread requests round a set of registered handlers, e.g web servers. Support
+plugins for session binding/sharding. 
 
-Some web servers can do this already; a custom YARN app could use grizzy embedded. Binding problem exists, but would support scaleable dispatch of values.
+Some web servers can do this already; a custom YARN app could use grizzy
+embedded. Binding problem exists, but would support scaleable dispatch of values.
 
-*  Could be offered as an AM extension (in provider, ...): scales well with #of apps in cluster, but adds initial location/failover problems.
+*  Could be offered as an AM extension (in provider, ...): scales well
+with #of apps in cluster, but adds initial location/failover problems.
 
-* If offered as a core-YARN service, location is handled via a fixed URL. This could place high load on the service, even just 302 redirects.
+* If offered as a core-YARN service, location is handled via a fixed
+URL. This could place high load on the service, even just 302 redirects.