You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@slider.apache.org by bi...@apache.org on 2014/06/24 21:46:39 UTC
svn commit: r1605167 [3/9] - in /incubator/slider/site: branches/ trunk/ trunk/cgi-bin/ trunk/content/ trunk/content/css/ trunk/content/design/ trunk/content/design/registry/ trunk/content/design/specification/ trunk/content/developing/ trunk/content/d...

Propchange: incubator/slider/site/trunk/content/css/bootstrap.css
------------------------------------------------------------------------------
    svn:eol-style = native

Added: incubator/slider/site/trunk/content/css/bootstrap.css.map
URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/css/bootstrap.css.map?rev=1605167&view=auto
==============================================================================
--- incubator/slider/site/trunk/content/css/bootstrap.css.map (added)
+++ incubator/slider/site/trunk/content/css/bootstrap.css.map Tue Jun 24 19:46:37 2014
@@ -0,0 +1 @@

[... 3 lines stripped ...]
Added: incubator/slider/site/trunk/content/css/bootstrap.min.css
URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/css/bootstrap.min.css?rev=1605167&view=auto
==============================================================================
--- incubator/slider/site/trunk/content/css/bootstrap.min.css (added)
+++ incubator/slider/site/trunk/content/css/bootstrap.min.css Tue Jun 24 19:46:37 2014
@@ -0,0 +1,7 @@
+/*!
+ * Bootstrap v3.1.1 (http://getbootstrap.com)
+ * Copyright 2011-2014 Twitter, Inc.
+ * Licensed under MIT (https://github.com/twbs/bootstrap/blob/master/LICENSE)
+ */
+

[... 3 lines stripped ...]
Propchange: incubator/slider/site/trunk/content/css/bootstrap.min.css
------------------------------------------------------------------------------
    svn:eol-style = native

Added: incubator/slider/site/trunk/content/css/slider.css
URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/css/slider.css?rev=1605167&view=auto
==============================================================================
--- incubator/slider/site/trunk/content/css/slider.css (added)
+++ incubator/slider/site/trunk/content/css/slider.css Tue Jun 24 19:46:37 2014
@@ -0,0 +1,84 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+body {
+    /* for fixed top navbar */
+    padding-top: 70px;
+    font-size: 16px;
+}
+ul.nav li.dropdown:hover > ul.dropdown-menu {
+    /* so nav submenus open on hover */
+    display: block;
+}
+#sidebar {
+    font-size: 14px;
+}
+#sociallinks td {
+    /* no lines in the link table */
+    border-top: none;
+}
+#bannertext {
+    margin-top: 10px;
+    text-align: left;
+}
+.clear {
+    clear: both;
+}
+#content {
+    padding: 0 8px 40px;
+}
+#content h1 {
+    margin-bottom: 0.5em;
+}
+#content h2 {
+    margin-bottom: 0.5em;
+    border-bottom: 1px solid #CCCCCC;
+    padding-bottom: 0.25em;
+}
+#content h3 {
+    margin-bottom: 0.5em;
+}
+#content img {
+    vertical-align: middle;
+}
+#footer {
+    border-top: 1px solid #CCCCCC;
+    color: #666666;
+    font-size: 0.8em;
+    padding: 8px 8px;
+    text-align: center;
+}
+#asf-logo {
+    float: left;
+    padding-top: 15px;
+}
+
+#download-button-sidebar {
+    width: 60%;
+    margin-left: auto;
+    margin-right: auto;
+}
+
+div.copyright {
+    width: 60%;
+    margin-left: auto;
+    margin-right: auto;
+}
+
+code {
+    /* override nowrap in bootstrap */
+    white-space: normal;
+}

Propchange: incubator/slider/site/trunk/content/css/slider.css
------------------------------------------------------------------------------
    svn:eol-style = native

Added: incubator/slider/site/trunk/content/design/architecture.md
URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/design/architecture.md?rev=1605167&view=auto
==============================================================================
--- incubator/slider/site/trunk/content/design/architecture.md (added)
+++ incubator/slider/site/trunk/content/design/architecture.md Tue Jun 24 19:46:37 2014
@@ -0,0 +1,142 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+# Apache Slider Architecture
+
+## Summary
+
+Slider is a YARN application to deploy non-YARN-enabled applications in a YARN cluster
+
+Slider consists of a YARN application master, the "Slider AM", and a client
+application which communicates with YARN and the Slider AM via remote procedure
+calls and/or REST requests. The client application offers command line access
+ as well as low-level API access for test purposes
+
+The deployed application must be a program that can be run across a pool of
+YARN-managed servers, dynamically locating its peers. It is not Slider's
+responsibility to configure up the peer servers, apart from some initial
+application-specific application instance configuration. (The full requirements
+of an application are [described in another document](/docs/slider_specs/application_needs.md).
+
+Every application instance is described as a set of one or more *component*; each
+component can have a different program/command, and a different set of configuration
+options and parameters.
+
+The AM takes the details on which roles to start, and requests a YARN container
+for each component; It then monitors the state of the application instance, receiving messages
+from YARN when a remotely executed process finishes. It then deploys another instance of 
+that component.
+
+
+## Slider Packaging
+
+A key goal of Slider is to support the deployment of existing applications into
+a YARN application instance, without having to extend Slider itself. 
+
+
+
+## AM Architecture
+
+The application master consists of
+
+ 1. The AM engine which handles all integration with external services, specifically YARN and any Slider clients
+ 1. A *provider* specific to deploying a class of applications.
+ 1. The Application State. 
+
+The Application State is the model of the application instance, containing
+
+ 1. A specification of the desired state of the application instance -the number of instances of each role, their YARN and process memory requirements and some other options. 
+ 1. A map of the current instances of each role across the YARN cluster, including reliability statistics of each node in the application instance used.
+ 1. [The Role History](/design/rolehistory.html) -a record of which nodes roles were deployed on for re-requesting the same nodes in future. This is persisted to disk and re-read if present, for faster application startup times.
+ 1. Queues of track outstanding requests, released and starting nodes
+
+The Application Engine integrates with the outside world: the YARN Resource Manager ("the RM"), and the node-specific Node Managers, receiving events from the services, requesting or releasing containers via the RM,  and starting applications on assigned containers.
+
+After any notification of a change in the state of the cluster (or an update to the client-supplied cluster specification), the Application Engine passes the information on to the Application State class, which updates its state and then returns a list of cluster operations to be submitted: requests for containers of different types -potentially on specified nodes, or requests to release containers.
+
+As those requests are met and allocation messages passed to the Application Engine, it works with the Application State to assign them to specific components, then invokes the provider to build up the launch context for that application.
+
+The provider has the task of populating  container requests with the file references, environment variables and commands needed to start the provider's supported programs.  
+
+The core provider deploys a minimal agent on the target containers, then, as the agent checks in to the agent provider's REST API, executes commands issued to it. 
+
+The set of commands this agent executes focuses on downloading archives from HDFS, expanding them, then running Python scripts which perform the
+actual configuration and execution of the target problem -primarily through template expansion.
+
+
+To summarize: Slider is not an classic YARN analysis application, which allocates and schedules work across the cluster in short-to-medium life containers with the lifespan of a query or an analytics session, but instead for an application with a lifespan of days to months. Slider works to keep the actual state of its application cluster to match the desired state, while the application has the tasks of recovering from node failure, locating peer nodes and working with data in an HDFS filesystem. 
+
+As such it is one of the first applications designed to use YARN as a platform for long-lived services -Samza being the other key example. These application's  needs of YARN are different, and their application manager design is focused around maintaining the distributed application in its desired state rather than the ongoing progress of submitted work.
+
+The clean model-view-controller split was implemented to isolate the model and aid mock testing of large clusters with simulated scale, and hence increase confidence that Slider can scale to work in large YARN clusters and with larger application instances. 
+
+
+
+### Failure Model
+
+The application master is designed to be a [crash-only application](https://www.usenix.org/legacy/events/hotos03/tech/full_papers/candea/candea.pdf), clients are free to terminate
+the application instance by asking YARN directly. 
+
+There is an RPC call to stop the application instance - this is a nicety which includes a message in the termination log, and
+could, in future, perhaps warn the provider that the application instance is being torn down. That is a potentially dangerous feature
+to add -as provider implementors may start to expect the method to be called reliably. Slider is designed to fail without
+warning, to rebuild its state on a YARN-initiated restart, and to be manually terminated without any advance notice.
+
+### RPC Interface
+
+
+The RPC interface allows the client to query the current application state, and to update it by pushing out a new JSON specification. 
+
+The core operations are
+
+* `getJSONClusterStatus()`: get the status of the application instance as a JSON document.
+* `flexCluster()` update the desired count of role instances in the running application instance.
+* `stopCluster` stop the application instance
+
+There are some other low-level operations for extra diagnostics and testing, but they are of limited importancs 
+
+The `flexCluster()` call takes a JSON application instance specification and forwards it to the AM -which extracts the desired counts of each role to update the Application State. A change in the desired size of the application instance, is treated as any reported failure of node:
+it triggers a re-evaluation of the application state, building up the list of container add and release requests to make of
+the YARN resource manager.
+
+The final operation, `stopCluster()`, stops the application instance. 
+
+### Security and Identity
+
+Slider's security model is described in detail in [an accompanying document](/docs/security.html)
+
+A Slider application instance is expected to access data belonging to the user creating the instance. 
+
+In a secure YARN cluster, this is done by acquiring Kerberos tokens in the client when the application instance is updated, tokens which
+are propagated to the Slider AM and thence to the deployed application containers themselves. These
+tokens are valid for a finite time period. 
+
+HBase has always required keytab files to be installed on every node in the Hadoop for it to have secure access -this requirement
+holds for Slider-deployed HBase clusters. Slider does not itself adopt the responsibility of preparing or distributing these files;
+this must be done via another channel.
+
+In Hadoop 2.2, the tokens for communication between the Slider AM and YARN expire after -by default- 72 hours. The
+HDFS tokens will also expire after some time period. This places an upper bound on the lifespan of a Slider application (or any
+other long-lived YARN application) in a secure Hadoop cluster. 
+
+
+
+In an insecure Hadoopp cluster, the Slider AM and its containers are likely to run in a different OS account from the submitting user.
+To enable access to the database files as that submitting use, the identity of the user is provided when the AM is created; the
+AM will pass this same identity down to the created containers. This information *identifies* the user -but does not *authenticate* them: they are trusted to be who they claim to be.
+
+ 

Propchange: incubator/slider/site/trunk/content/design/architecture.md
------------------------------------------------------------------------------
    svn:eol-style = native

Added: incubator/slider/site/trunk/content/design/index.md
URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/design/index.md?rev=1605167&view=auto
==============================================================================
--- incubator/slider/site/trunk/content/design/index.md (added)
+++ incubator/slider/site/trunk/content/design/index.md Tue Jun 24 19:46:37 2014
@@ -0,0 +1,24 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+  
+# Apache Slider Architecture
+
+* [Overview](/design/architecture.html)
+* [Application Needs](/docs/slider_specs/application_needs.html)
+* [Specification](/design/specification/index.html)
+* [Service Registry](/design/registry/index.html)
+* [Role history](/design/rolehistory.html) 

Propchange: incubator/slider/site/trunk/content/design/index.md
------------------------------------------------------------------------------
    svn:eol-style = native

Added: incubator/slider/site/trunk/content/design/registry/a_YARN_service_registry.md
URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/design/registry/a_YARN_service_registry.md?rev=1605167&view=auto
==============================================================================
--- incubator/slider/site/trunk/content/design/registry/a_YARN_service_registry.md (added)
+++ incubator/slider/site/trunk/content/design/registry/a_YARN_service_registry.md Tue Jun 24 19:46:37 2014
@@ -0,0 +1,226 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+# A YARN Service Registry for Apache Slider
+
+## April 2014
+
+# Introduction
+
+This document looks at the needs and options of a service registry.
+
+The core issue is that as the location(s) of a dynamically deployed application are unknown, the standard Hadoop and Java configuration model of some form of text files containing hostnames, ports and URLS no longer works. You cannot define up-front where a service will be.
+
+Some Hadoop applications -HBase and Accumulo -have solved this with custom ZK bindings. This works for the specific clients, but requires hbase and accumulo client JARs in order to be able to work with the content. (or a re-implementation with knowledge of the non-standard contents of the ZK nodes)
+
+Other YARN applications will need to publish their bindings - this includes, but is not limited to- Slider deployed applications. Again, these applications can use their own registration and binding model, which would again require custom clients to locate the registry information and parse the contents.
+
+YARN provides some minimal publishing of AM remote endpoints: a URL to what is assumed to be a Web UI (not a REST API), and an IPC port. The URL is displayed in the YARN UI -in which case it is accessed via a proxy which (currently) only support HTTP GET operations. The YARN API call to list all applications can be used to locate a named instance of an application by (user, application-type, name), and then obtain the raw URL and IPC endpoints. This enumeration process is an O(apps) operation on the YARN RM and only provides access to those two endpoints. Even with the RAW URL, REST operations have proven "troublesome", due to a web filter which redirects all direct requests to the proxy -unless it comes from the same host as the proxy.
+
+Hadoop client applications tend to retrieve all their configuration information from files in the local filesystem, hadoop-site.xml, hdfs-site.xml, hbase-site.xml, etc. This requires the configuration files to be present on all systems. Tools such as Ambari can keep the files in the server up to date -assuming a low rate of change- ---but these tools do nothing for the client applications themselves. It is up to the cluster clients to (somehow) retrieve these files, and to keep their copies up to date. *This is a problem that exists with today's applications*. 
+
+As an example, if a YARN client does not know the value of "yarn.application.classpath", it cannot successfully deploy any application in the YARN cluster which needs the cluster-side Hadoop and YARN JARs on its application master's classpath. This is not a theoretical problem, as some clusters have a different classpath from the default: without a correct value the Slider AM does not start. And, as it is designed to be run remotely, it cannot rely on a local installation of YARN to provide the correct values ([YARN-973](https://issues.apache.org/jira/browse/YARN-973)).
+
+# What do we need?
+
+**Discovery**: An IPC and URL discovery system for service-aware applications to use to look up a service to which it wishes to talk to. This is not an ORB -it's not doing redirection -, but it is something that needs to be used before starting IPC or REST communications. 
+
+**Configuration**: A way for clients of a service to retrieve more configuration data than simply the service endpoints. For example: everything needed to create a site.xml document.
+
+## Client-side
+
+* Allow clients of a YARN application to locate the service instance and its service ports (web, IPC, REST...) efficiently even on a large YARN cluster. 
+
+* Allow clients to retrieve configuration values which can be processed client-side into the configuration files and options which the application needs
+
+* Give clients confidence that the service with which they interact is the one they expect to interact with -not another potentially malicious service deployed by a different user. 
+
+* clients to be able to watch a service and retrieve notification of changes
+
+* cross-language support.
+
+## For all Services
+
+* Allow services to publish their binding details for the AM and of code running in the containers (which may be published by the containers)
+
+* Use entries in registry as a way of enforcing uniqueness of the instance (app, owner, name)? 
+
+* values to update when a service is restarted on a different host
+
+* values to indicate when a service is not running. This may be implicit "no entry found" or explicit "service exists but not running"
+
+* Services to be able to act as clients to other services
+
+## For Slider Services (and presumably others)
+
+* Ability to publish information about configuration documents that can be retrieved -and URLs
+
+* Ability to publish facts internal to the application (e.g. agent reporting URLs)
+
+* Ability to use service paths as a way to ensure a single instance of a named service can be deployed by a user
+
+## Management and PaaS UIs
+
+* Retrieve lists of web UI URLs of AM and of deployed components
+
+* Enum components and their status
+
+* retrieve dynamic assignments of IPC ports
+
+* retrieve dynamic assignments of JMX ports
+
+* retrieve any health URLs for regular probes
+
+* Listen to changes in the service mix -the arrival and departure of service instances, as well as changes in their contents.
+
+
+
+## Other Needs
+
+* Registry-configured applications. In-cluster applications should be able to subscribe to part of the registry
+to pick up changes that affect them -both for their own application configuration, and for details about
+applications on which they depend themselves.
+
+* Knox: get URLs that need to be converted into remote paths
+
+* Cloud-based deployments: work on virtual infrastructures where hostnames are unpredictable.
+
+# Open Source Registry code
+
+What can we use to implement this from ASF and ASF-compatible code? 
+
+## Zookeeper
+
+We'd need a good reason not to use this. There are still some issues
+
+1. Limits on amount of published data?
+
+2. Load limits, especially during cluster startup, or if a 500-mapper job all wants to do a lookup.
+
+3. Security story
+
+4. Impact of other ZK load on the behaviour of the service registry -will it cause problems if overloaded -and are they recoverable?
+
+## Apache Curator
+
+Netflix's core curator -now [Apache Curator](http://curator.apache.org/)- framework adds a lot to make working with ZK easier, including pluggable retry policies, binding tools and other things.
+
+There is also its "experimental" [service discovery framework](http://curator.apache.org/curator-x-discovery-server/index.html), which
+
+1. Allows a service to register a URL with a name and unique ID (and custom metadata). multiple services of a given name can be registered
+
+2. Allows a service to register >1 URL.
+
+3. Has a service client which performs lookup and can cache results.
+
+4. Has a REST API
+
+Limitations
+
+* The service discovery web UI and client does not work with the version of
+Jackson (1.8.8) in Hadoop 2.4. The upgraded version in Hadoop 2.5 is compatible [HADOOP-10104](https://issues.apache.org/jira/browse/HADOOP-10104).
+
+* The per-entry configuration payload attempts to get jason to perform Object/JSON mapping with the classname provided as an attribute in the JSON. This destroys all ability of arbitrary applications to parse the published data, as well as cross-language clients -is brittle and morally wrong from a data-sharing perspective.
+
+    {
+    
+      "name" : "name",
+      "id" : "service",
+      "address" : "localhost",
+      "port" : 8080,
+      "sslPort" : 443,
+      "payload" : {
+        "@class" : "org.apache.slider.core.registry.ServiceInstanceData",
+        "externalView" : {
+          "key" : "value"
+        }
+      },
+      "registrationTimeUTC" : 1397249829062,
+      "serviceType" : "DYNAMIC",
+      "uriSpec" : {
+        "parts" : [ {
+          "value" : "http:",
+          "variable" : false
+        }, {
+          "value" : ":",
+          "variable" : false
+        } ]
+      }
+    }
+
+
+
+## [Helix Service Registry](http://helix.apache.org/0.7.0-incubating-docs/recipes/service_discovery.html)
+
+This is inside Helix somewhere, used in LI in production at scale -and worth looking at. LI separate their Helix Zookeeper Quorum from their application-layer quorum, to isolate load.
+
+Notable features
+
+1. The registry is also the liveness view of the deployed application. Client's aren't watching the service registry for changes, they are watching Helix's model of the deployed application.
+1. The deployed application can pick up changes to its state the same way, allowing for live application manipulation.
+1. Tracks nodes that continually join/leave the group and drops them as unreliable.
+
+## Twill Service Registry
+
+Twill's [service registry code](http://twill.incubator.apache.org/apidocs/index.html), lets applications register a  [(hostname, port)](http://twill.incubator.apache.org/apidocs/org/apache/twill/discovery/Discoverable.html) pair in the registry by a name, a name by which clients can look up and enumerate all services with a specific name.
+
+Clients can subscribe to changes in the list of services with a specific name -so picking up the arrival and departure of instances, and probe to see if a previously discovered entity is still registered.
+
+Zookeeper- and in-memory registry implementations are provided.
+
+One nice feature about this architecture -and Twill in general- is that its general single-method callback model means that it segues nicely into Java-8 lambda-expressions. This is something to retain in a YARN-wide service registry.
+
+Comparing it to curator, it offers a proper subset of curator's registered services [ServiceInstance](http://curator.apache.org/apidocs/org/apache/curator/x/discovery/ServiceInstance.html) -implying that you could publish and retrieve Curator-registered services via a new implementation of Twill's DiscoveryService. This would require extensions to the curator service discovery client allow ZK nodes to be watched for changes. This is a feature that would be useful in many use cases -such as watching service availability across a cluster, or simply blocking until a dependent service was launched.
+
+As with curator, the amount of information that can be published isn't enough for management tools to make effective use of the service registration, while for slider there's no way to publish configuration data. However a YARN registry will inevitably be a superset of the Twill client's enumerated and retrieved data -so if its registration API were sufficient to register a minimal service, supporting the YARN registry via Twill's existing API should be straightforward.
+
+## Twitter Commons Service Registration
+
+[Twitter Commons](https://github.com/twitter/commons) has a service registration library, which allows for registration of sets of servers, [publishing the hostname and port of each](http://twitter.github.io/commons/apidocs/com/twitter/common/service/registration/package-tree.html)., along with a map of string properties.
+
+Zookeeper based, it suffices if all servers are identical and only publishing single (hostname, port) pairs for callers.
+
+## AirBnB Smartstack
+
+SmartStack is [Air BnB's cloud-based service discovery system](http://nerds.airbnb.com/smartstack-service-discovery-cloud/).
+
+It has two parts, *Nerve* and *Synapse*:
+
+[**Nerve**](https://github.com/airbnb/nerve) is a ruby agent designed to monitor processes and register healthy instances in ZK (or to a mock reporter). It includes [probes for TCP ports, HTTP and rabbitMQ](https://github.com/airbnb/nerve/tree/master/lib/nerve/service_watcher). It's [a fairly simple liveness monitor](https://github.com/airbnb/nerve/blob/master/lib/nerve/service_watcher.rb).
+
+[**Synapse**](https://github.com/airbnb/synapse) takes the data and uses it to configure [HAProxy instances](http://haproxy.1wt.eu/). HAProxy handles the load balancing, queuing and integrating liveness probes into the queues. Synapse generates all the configuration files for an instance -but also tries to reconfigure the live instances via their socket APIs, 
+
+Alongside these, AirBnB have another published project on Github, [Optica](https://github.com/airbnb/optica), which is a web application for nodes to register themselves with (POST) and for others to query. It publishes events to RabbitMQ, and again uses ZK to store state.
+
+AirBnB do complain a bit about ZK and its brittleness. They do mention that they suspect it is due to bugs in the Ruby ZK client library. This may be exacerbated by in-cloud deployments. Hard-coding the list of ZK nodes may work for a physical cluster, but in a virtualized cluster, the hostnames/IP Addresses of those nodes may change -leading to a meta-discovery problem: how to find the ZK quorum -especially if you can't control the DNS servers.
+
+## [Apache Directory](http://directory.apache.org/apacheds/)
+
+This is an embeddable LDAP server
+
+* Embeddable inside Java apps
+
+* Supports Kerberos alongside X.500 auth. It can actually act as a Key server and TGT if desired.
+
+* Supports DNS and DHCP queries.
+
+* Accessible via classic LDAP APIs.
+
+This isn't a registry service directly, though LDAP queries do make enumeration of services *and configuration data* straightforward. As LDAP libraries are common across languages -even built in to the Java runtime- LDAP support makes publishing information to arbitrary clients relatively straightforward.
+
+If service information were to be published via LDAP, then it should allow IT-managed LDAP services to both host this information, and publish configuration data. This would be relevant for classic Hadoop applications if we were to move the Configuration class to support back-end configuration sources beyond XML files on the classpath.
+

Propchange: incubator/slider/site/trunk/content/design/registry/a_YARN_service_registry.md
------------------------------------------------------------------------------
    svn:eol-style = native

Added: incubator/slider/site/trunk/content/design/registry/index.md
URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/design/registry/index.md?rev=1605167&view=auto
==============================================================================
--- incubator/slider/site/trunk/content/design/registry/index.md (added)
+++ incubator/slider/site/trunk/content/design/registry/index.md Tue Jun 24 19:46:37 2014
@@ -0,0 +1,47 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+  
+# Apache Slider Service Registry
+
+The service registry is a core part of the Slider Architecture -it is how
+dynamically generated configurations are published for clients to pick up.
+
+The need for a service registry goes beyond Slider, however. We effectively
+have application-specific registries for HBase and Accumulo, and explicit
+registries in Apache Helix and Apache Twill, as well as re-usable registry
+code in Apache Curator.
+
+[YARN-913](https://issues.apache.org/jira/browse/YARN-913) covers the need
+for YARN itself to have a service registry. This would be the ideal ultimate
+solution -it would operate at a fixed location/ZK path, and would be guaranteed
+to be on all YARN clusters, so code could be written expecting it to be there.
+
+It could also be used to publish binding data from static applications,
+including HBase, Accumulo, Oozie, -applications deployed by management tools.
+Unless/until these applications self-published their binding data, it would
+be the duty of the management tools to do the registration.
+
+
+
+## Contents
+
+1. [YARN Application Registration and Binding: the Problem](the_YARN_application_registration_and_binding_problem.html)
+1. [A YARN Service Registry](a_YARN_service_registry.html)
+1. [April 2014 Initial Registry Design](initial_registry_design.html)
+1. [Service Registry End-to-End Scenarios](service_registry_end_to_end_scenario.html)
+1. [P2P Service Registries](p2p_service_registries.html)
+1. [References](references.html)

Propchange: incubator/slider/site/trunk/content/design/registry/index.md
------------------------------------------------------------------------------
    svn:eol-style = native

Added: incubator/slider/site/trunk/content/design/registry/initial_registry_design.md
URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/design/registry/initial_registry_design.md?rev=1605167&view=auto
==============================================================================
--- incubator/slider/site/trunk/content/design/registry/initial_registry_design.md (added)
+++ incubator/slider/site/trunk/content/design/registry/initial_registry_design.md Tue Jun 24 19:46:37 2014
@@ -0,0 +1,110 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+# April 2014 Initial Registry Design for Apache Slider
+
+This is the plan for the initial registry design.
+
+1. Use Apache Curator [service discovery code](http://curator.apache.org/curator-x-discovery/index.html). 
+
+2. AMs to register as (user, name). Maybe "service type" if we add that as an option in the slider configs
+
+3. Lift "external view" term from Helix -concept that this is the public view, not internal.
+
+4. application/properties section to list app-wide values
+
+5. application/services section to list public service URLs; publish each as unique-ID-> (human name, URL, human text). code can resolve from UniqueID; UIs can use human data.
+
+6. String Template 2 templates for generation of output (rationale:  library for Python Java, .NET)
+
+7. Java CLI to retrieve values from ZK and apply named template (local, hdfs). Include ability to restrict to list of named properties (pattern match).
+
+8. AM to serve up curator service (later -host in RM? elsewhere?)
+
+### forwards-compatilibity
+
+1. This initial design will hide the fact that Apache Curator is being used to discover services,
+by storing information in the payload, `ServiceInstanceData` rather than in (the minimal) curator
+service entries themselves. If we move to an alternate registry, provided we
+can use the same datatype -or map to it- changes should not be visible.
+
+1. The first implementation will not support watching for changes.
+
+### Initial templates 
+
+* hadoop XML conf files
+
+* Java properties file
+
+* HTML listing of services
+
+
+
+## Example Curator Service Entry
+
+This is the prototype's content
+
+Toplevel
+
+    service CuratorServiceInstance{name='slider', id='stevel.test_registry_am', address='192.168.1.101', port=62552, sslPort=null, payload=org.apache.slider.core.registry.info.ServiceInstanceData@4e9af21b, registrationTimeUTC=1397574073203, serviceType=DYNAMIC, uriSpec=org.apache.curator.x.discovery.UriSpec@ef8dacf0} 
+
+Slider payload.
+
+    payload=
+    {
+      "internalView" : {
+        "endpoints" : {
+          "/agents" : {
+            "value" : "http://stevel-8.local:62552/ws/v1/slider/agents",
+            "protocol" : "http",
+            "type" : "url",
+            "description" : "Agent API"
+          }
+        },
+        "settings" : { }
+      },
+    
+      "externalView" : {
+        "endpoints" : {
+          "/mgmt" : {
+            "value" : "http://stevel-8.local:62552/ws/v1/slider/mgmt",
+            "protocol" : "http",
+            "type" : "url",
+            "description" : "Management API"
+          },
+    
+          "slider/IPC" : {
+            "value" : "stevel-8.local/192.168.1.101:62550",
+            "protocol" : "org.apache.hadoop.ipc.Protobuf",
+            "type" : "address",
+            "description" : "Slider AM RPC"
+          },
+          "registry" : {
+            "value" : "http://stevel-8.local:62552/ws/registry",
+            "protocol" : "http",
+            "type" : "url",
+            "description" : "Registry"
+          }
+        },
+        "settings" : { }
+      }
+    }
+
+ 
+
+   
+

Propchange: incubator/slider/site/trunk/content/design/registry/initial_registry_design.md
------------------------------------------------------------------------------
    svn:eol-style = native

Added: incubator/slider/site/trunk/content/design/registry/p2p_service_registries.md
URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/design/registry/p2p_service_registries.md?rev=1605167&view=auto
==============================================================================
--- incubator/slider/site/trunk/content/design/registry/p2p_service_registries.md (added)
+++ incubator/slider/site/trunk/content/design/registry/p2p_service_registries.md Tue Jun 24 19:46:37 2014
@@ -0,0 +1,137 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+  
+# P2P Service Registries for Apache Slider
+
+Alongside the centralized service registries, there's much prior work on
+P2P discovery systems, especially for mobile and consumer devices.
+
+They perform some multicast- or distributed hash table-based lookup,
+and tend to have common limitations:
+
+* scalability
+
+* the bootstrapping problem
+
+* security: can you trust the results to be honest?
+
+* consistency: can you trust the results to be complete and current?
+
+Bootstrapping is usually done via multicast, possibly then switching
+to unicast for better scale. As multicasting doesn't work in cloud
+infrastructures, none of the services work unmodified  in public
+clouds. There's multiple anecdotes of
+[Amazon's SimpleDB service](http://aws.amazon.com/simpledb/) being used as a
+registry for in-EC2 applications. At the very least, this service and its
+equivalents in other cloud providers could be used to bootstrap ZK client
+bindings in cloud environments. 
+
+## Service Location Protocol 
+
+Service Location Protocol is a protocol for discovery services that came out
+of Sun, Novell and others -it is still available for printer discovery and
+suchlike
+
+It supports both a multicast discovery mechanism, and a unicast protocol
+to talk to a Directory Agent -an agent that is itself discovered by multicast
+requests, or by listening for the agent's intermittent multicast announcements.
+
+There's an extension to DHCP, RFC2610, which added the ability for DHCP to
+advertise Directory Agents -this was designed to solve the bootstrap problem
+(though not necessarily security or in-cloud deployment). Apart from a few
+mentions in Windows Server technical notes, it does not appear to exist.
+
+* [[RFC2608](http://www.ietf.org/rfc/rfc2608.txt)] *Service Location Protocol, Version 2* , IEEE, 1999
+
+* [[RFC3224](http://www.ietf.org/rfc/rfc3224.txt)] *Vendor Extensions for Service Location Protocol, Version 2*, IETF, 2003
+
+* [[RFC2610](http://www.ietf.org/rfc/rfc2610.txt)] *DHCP Options for Service Location Protocol, IETF, 1999*
+
+## [Zeroconf](http://www.zeroconf.org/)
+
+The multicast discovery service implemented in Apple's Bonjour system
+--multicasting DNS lookups to all peers in the subnet.
+
+This allows for URLs and hostnames to be dynamically positioned, with
+DNS domain searches allowing for enumeration of service groups. 
+
+This protocol scales very badly; the load on *every* client in the
+subnet is is O(DNS-queries-across-subnet), hence implicitly `O(devices)*O(device-activity)`. 
+
+The special domains `_tcp.`, `_udp.`  and their subdomains can also be
+served up via a normal DNS server.
+
+##  [Jini/Apache River](http://river.apache.org/doc/specs/html/lookup-spec.html)
+
+Attribute-driven service enumeration, which drives the, Java-client-only
+model of downloading client-side code. There's no requirement for the remote
+services to be in Java, only that drivers are.
+
+## [Serf](http://www.serfdom.io/)  
+
+This is a library that implements the [SWIM protocol](http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf) to propagate information around a cluster. Apparently works in virtualized clusters too. It's already been used in a Flume-on-Hoya provider.
+
+## [Anubis](http://sourceforge.net/p/smartfrog/svn/HEAD/tree/trunk/core/components/anubis/)
+
+An HP Labs-built [High Availability tuple-space](http://sourceforge.net/p/smartfrog/svn/HEAD/tree/trunk/core/components/anubis/doc/HPL-2005-72.pdf?format=raw) in SmartFrog; used in production in some of HP's telco products. An agent publishes facts into the T-Space, and within one heartbeat all other agents have it. One heart-beat later, unless there's been a change in the membership, the publisher knows the others have it. One heartbeat later the agents know the publisher knows it, etc.
+
+Strengths: 
+
+* The shared knowledge mechanism permits reasoning and mathematical proofs.
+
+* Strict ordering between heartbeats implies an ordering in receipt.
+This is stronger than ZK's guarantees.
+
+* Lets you share a moderate amount of data (the longer the heartbeat
+interval, the more data you can publish).
+
+* Provided the JVM hosting the Anubis agent is also hosting the service,
+liveness is implicit
+
+* Secure to the extent that it can be locked down to allow only nodes with
+mutual trust of HTTPS certificates to join the tuple-space.
+
+Weaknesses
+
+* (Currently) bootstraps via multicast discovery.
+
+* Brittle to timing, especially on virtualized clusters where clocks are unpredictable.
+
+It proved good for workload sharing -tasks can be published to it, any
+agent can say "I'm working on it" and take up the work. If the process
+fails, the task becomes available again. We used this for distributed scheduling in a rendering farm.
+
+## [Carmen](http://www.hpl.hp.com/techreports/2002/HPL-2002-257)
+
+This was another HP Labs project, related to the Cooltown "ubiquitous
+computing" work, which was a decade too early to be relevant. It was
+also positioned by management as a B2B platform, so ended up competing
+with - and losing against - WS-* and UDDI. 
+
+Carmen aimed to provide service discovery with both fixed services, and
+with highly mobile client services that will roam around the network -they
+are assumed to be wireless devices.
+
+Services were published with and searched for by attributed, locality
+was considered to be a key attribute -local instances of a service
+prioritized. Those services with a static location and low rate of
+change became the stable caches of service information -becoming,
+as with skype, "supernodes". 
+
+Bootstrapping the cluster relied on multicast, though alternatives
+based on DHCP and DNS were proposed.
+

Propchange: incubator/slider/site/trunk/content/design/registry/p2p_service_registries.md
------------------------------------------------------------------------------
    svn:eol-style = native

Added: incubator/slider/site/trunk/content/design/registry/references.md
URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/design/registry/references.md?rev=1605167&view=auto
==============================================================================
--- incubator/slider/site/trunk/content/design/registry/references.md (added)
+++ incubator/slider/site/trunk/content/design/registry/references.md Tue Jun 24 19:46:37 2014
@@ -0,0 +1,49 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+  
+# Service Registry References
+
+Service registration and discovery is a problem in distributed computing that has been explored for over thirty years, with 
+[Birrell81]'s *Grapevine* system the first known implementation -though of 
+
+# Papers
+
+* **[Birrell81]** Birrell, A. et al, [*Grapevine: An exercise in distributed computing*](http://research.microsoft.com/apps/pubs/default.aspx?id=63661). Comm. ACM 25, 4 (Apr 1982), pp260-274. 
+The first documented directory service; relied on service shutdown to resolve update operations.
+
+* **[Das02]** [*SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol*](http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf)
+P2P gossip-style data sharing protocol with random liveness probes to address scalable liveness checking. Ceph uses similar liveness checking.
+
+* **[Marti02]** Marti S. and Krishnam V., [*Carmen: A Dynamic Service Discovery Architecture*](http://www.hpl.hp.com/techreports/2002/HPL-2002-257), 
+
+* **[Lampson86]** Lampson, B. [*Designing a Global Naming Service*](http://research.microsoft.com/en-us/um/people/blampson/36-GlobalNames/Acrobat.pdf). DEC. 
+Distributed; includes an update protocol and the ability to add links to other parts of the tree. Also refers to [*Xerox Clearinghouse*](http://bitsavers.informatik.uni-stuttgart.de/pdf/xerox/parc/techReports/OPD-T8103_The_Clearinghouse.pdf), which apparently shipped.
+
+* **[Mockapetris88]** Mockapetris, P. [*Development of the domain name system*](http://bnrg.eecs.berkeley.edu/~randy/Courses/CS268.F08/papers/31_dns.pdf). The history of DNS
+
+* **[Schroeder84]** Schroeder, M.D. et al, [*Experience with Grapevine: The Growth of a Distributed System*](http://research.microsoft.com/apps/pubs/default.aspx?id=61509). Xerox.
+Writeup of the experiences of using grapevine, with its eventual consistency and lack of idempotent message delivery called out -along with coverage of operations issues.
+
+* **[van Renesse08]**  van Renesse, R. et al, [*Astrolabe: A Robust and Scalable Technology For Distributed System Monitoring, Management, and Data Mining*](http://www.cs.cornell.edu/home/rvr/papers/astrolabe.pdf). ACM Transactions on Computer Systems
+Grandest P2P management framework to date; the work that earned Werner Vogel his CTO position at Amazon.
+ 
+* **[van Steen86]** van Steen, M. et al, [*A Scalable Location Service for Distributed Objects*](http://www.cs.vu.nl/~ast/publications/asci-1996a.pdf). 
+Vrije Universiteit, Amsterdam. Probably the first Object Request Broker
+
+
+
+ 

Propchange: incubator/slider/site/trunk/content/design/registry/references.md
------------------------------------------------------------------------------
    svn:eol-style = native

Added: incubator/slider/site/trunk/content/design/registry/registry-model.md
URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/design/registry/registry-model.md?rev=1605167&view=auto
==============================================================================
--- incubator/slider/site/trunk/content/design/registry/registry-model.md (added)
+++ incubator/slider/site/trunk/content/design/registry/registry-model.md Tue Jun 24 19:46:37 2014
@@ -0,0 +1,75 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+  
+# Apache Slider: Service Registry
+
+The service registry model is designed to support dynamically
+deployed Slider applications, *and* statically deployed versions
+of the same application -provided the latter also registers itself,
+its public network services, and any configurations and files
+that it wishes clients to be able to retrieve.
+
+The architecture and implementation of this registry is not defined
+here -instead the view of it seen by clients.
+
+1. A 'service registry' exists in the YARN cluster into which
+services can be registered. 
+
+1. There is no restriction on the number of services that can be registered in
+the registry, the type of service that may register, or even on how many
+registered services an application running in the YARN cluster may register.
+
+1. Services are registered by their type, owner and name. As an example,
+Alice's slider-managed HBase cluster `ingress` would have a type `org.apache.hbase`,
+owner `alice` and name `ingress`. 
+
+1. In the case of Slider-managed services, there is a separate slider instance
+registration which publishes information about slider itself. In the example
+above, this would be (`org.apache.slider`,`alice`,`ingress`).
+
+1. Services can publish information about themselves, with common entries being:
+
+    * service name and description.
+    * URLs of published web UIs and web service APIs
+    * network address of other protocols
+    
+1. Services may also publish.    
+    
+    * URLs to configuration details
+    * URLs documents published for use as client-side configuration data -either
+      directly or through some form of processing.
+    * public service-specific data, for use by applications that are aware of
+      the specific service type.
+    * internal service-specific data -for use by the components that comprise
+      an application. This allows the registry to be used to glue together
+      the application itself.
+      
+1. Services can be listed and examined.
+
+1. Service-published configuration key-value pairs can be retrieved by clients
+
+1. Service-published documents (and packages of such documents) can be
+retrieved by clients.
+
+1. There's no requirement for service instances to support any standard protocols;
+
+1. Some protocols are defined which they MAY implement. For example, the protocol
+to enumerate and retrieve configuration documents is designed to be implemented
+by any service that wishes to publish such content.
+
+1. In a secure cluster, the registry data may be restricted, along with any
+service protocols offered by the registered services. 

Propchange: incubator/slider/site/trunk/content/design/registry/registry-model.md
------------------------------------------------------------------------------
    svn:eol-style = native

Added: incubator/slider/site/trunk/content/design/registry/service_registry_end_to_end_scenario.md
URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/design/registry/service_registry_end_to_end_scenario.md?rev=1605167&view=auto
==============================================================================
--- incubator/slider/site/trunk/content/design/registry/service_registry_end_to_end_scenario.md (added)
+++ incubator/slider/site/trunk/content/design/registry/service_registry_end_to_end_scenario.md Tue Jun 24 19:46:37 2014
@@ -0,0 +1,156 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+# Apache Slider: Service Registry End-to-End Scenarios
+
+## AM startup
+
+1. AM starts, reads in configuration, creates provider
+
+2. AM builds web site, involving provider in process  (*there's a possible race condition here, due to the AM registration sequence)*
+
+3. AM registers self with RM, including web and IPC ports, and receives list of existing containers; container loss notifications come in asynchronously *(which is why the AM startup process is in a synchronized block)*
+
+4. AM inits it's `ApplicationState` instance with the config, instance description and RM-supplied container list.
+
+5. AM creates service registry client using ZK quorum and path provided when AM was started
+
+6. AM registers standard endpoints: RPC, WebUI, REST APIs
+
+7. AM registers standard content it can serve (e.g `yarn-site.xml`)
+
+8. AM passes registry to provider in `bind()` operation.
+
+9. AM triggers review of application state, requesting/releasing nodes as appropriate
+
+## Agent Startup: standalone
+
+1. Container is issued to AM
+
+2. AM chooses component, launches agent on it -with URL of AM a parameter (TODO: Add registry bonding of ZK quorum and path)
+
+3. Agent starts up.
+
+4. Agent locates AM via URL/ZK info
+
+5. Agent heartbeats in with state
+
+6. AM gives agent next state command.
+
+## AM gets state from agent:
+
+1. Agent heartbeats in
+
+2. AM decides if it wants to receive config 
+
+3. AM issues request for state information -all (dynamic) config data
+
+4. Agent receives it
+
+5. Agent returns all config state, including: hostnames, allocated ports, generated values (e.g. database connection strings, URLs) - as two-level (allows agent to define which config options are relevant to which document)
+
+## AM saves state for serving
+
+1. AM saves state in RAM (assumptions: small, will rebuild on restart)
+
+2. AM updates service registry with list of content that can be served up and URLs to retrieve them.
+
+3. AM fields HTTP GET requests on content
+
+## AM Serves content
+
+A simple REST service serves up content on paths published to the service registry. It is also possible to enumerate documents published by GET  operations on parent paths.
+
+1. On GET command, AM locates referenced agent values
+
+2. AM builds up response document from K-V pairs. This can be in a limited set of formats: Hadoop XML, Java properties, YAML, CSV, HTTP, JSON chosen as ? type param. (this generation is done from template processing in AM using slider.core.template module)
+
+3. response is streamed with headers of : `content-type`, `content-length`, do not cache in proxy, expires,* (with expiry date chosen as ??)*
+
+# Slider Client
+
+Currently slider client enumerates the YARN registry looking for slider instances -including any instances of the same application running before launching a cluster. 
+
+This 
+
+* has race conditions
+* has scale limitations `O(apps-in-YARN-cluster)` + `O(completed-apps-in-RM-memory)`
+* only retrieves configuration information from slider-deployed application instances. *We do not need to restrict ourselves here.*
+
+## Slider Client lists applications
+
+    slider registry --list [--servicetype <application-type>]
+
+1. Client starts
+
+2. Client creates creates service registry client using ZK quorum and path provided in client config properties (slider-client.xml)
+
+3. Client enumerates registered services and lists them
+
+## Slider Client lists content published by an application instance
+
+    slider registry <instance> --listconf  [--servicetype <application-type>]
+
+1. Client starts
+
+2. Client creates creates service registry client using ZK quorum and path provided in client config properties (slider-client.xml)
+
+3. Client locates registered service entry -or fails
+
+4. Client retrieves service data, specifically the listing of published documents
+
+5. Client displays list of content
+
+## Slider Client retrieves content published by an application instance
+
+    slider registry <instance> --getconf <document> [--format (xml|properties|text|html|csv|yaml|json,...) [--dest <file>]  [--servicetype <application-type>]
+
+1. Client starts
+
+2. Client creates creates service registry client using ZK quorum and path provided in client config properties (slider-client.xml)
+
+3. Client locates registered service entry -or fails
+
+4. Client retrieves service data, specifically the listing of published documents
+
+5. Client locates URL of content
+
+6. Client builds GET request including format
+
+7. Client executes command, follows redirects, validates content length against supplied data.
+
+8. Client prints response to console or saves to output file. This is the path specified as a destination, or, if that path refers to a directory, to
+a file underneath.
+
+## Slider Client retrieves content set published by an application instance
+
+Here a set of documents published is retrieved in the desired format of an application.
+
+## Slider Client retrieves document and applies template to it
+
+Here a set of documents published is retrieved in the desired format of an application.
+
+    slider registry <instance> --source <document> [--template <path-to-template>] [--outfile <file>]  [--servicetype <application-type>]
+
+1. document is retrieved as before, using a simple format such as json to retrieve it.
+
+2. The document is parsed and converted back into K-V pairs
+
+3. A template using a common/defined template library is applied to the content , generating the final output.
+
+Template paths may include local filesystem paths or (somehow) something in a package file
+

Propchange: incubator/slider/site/trunk/content/design/registry/service_registry_end_to_end_scenario.md
------------------------------------------------------------------------------
    svn:eol-style = native

Added: incubator/slider/site/trunk/content/design/registry/the_YARN_application_registration_and_binding_problem.md
URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/design/registry/the_YARN_application_registration_and_binding_problem.md?rev=1605167&view=auto
==============================================================================
--- incubator/slider/site/trunk/content/design/registry/the_YARN_application_registration_and_binding_problem.md (added)
+++ incubator/slider/site/trunk/content/design/registry/the_YARN_application_registration_and_binding_problem.md Tue Jun 24 19:46:37 2014
@@ -0,0 +1,192 @@
+<!---
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+# Apache Slider YARN Application Registration and Binding: the Problem
+
+## March 2014
+
+# How to bind client applications to services dynamically placed applications?
+
+
+There are some constraints here
+
+1. The clients may be running outside the cluster -potentially over long-haul links.
+
+1. The location of an application deployed in a YARN cluster cannot be predicted.
+
+1. The ports used for application service endpoints cannot be hard-coded
+or predicted. (Alternatively: if they are hard-coded, then Socket-In-Use exceptions may occur)
+
+1: As components fail and get re-instantiated, their location may change. 
+The rate of this depends on cluster and application stability; the longer
+ lived the application, the more common it is.
+
+Existing Hadoop client apps have a configuration problem of their own:
+how are the settings in files such as `yarn-site.xml`picked up by today's
+applications? This is an issue which has historically been out of scope
+for Hadoop clusters -but if we are looking at registration and binding
+of YARN applications, there should be no reason why
+static applications cannot be discovered and bonded to using the same mechanisms. 
+
+# Other constraints:
+
+1. Reduce the amount of change needed in existing applications to a minimum 
+---ideally none, though some pre-launch setup may be acceptable.
+
+2. Prevent malicious applications from registering a service endpoints.
+
+3. Scale with #of applications and #of clients; not overload on a cluster partitioning.
+
+4. Offer a design that works with apps that are deployed in a YARN custer 
+outside of Slider. Rationale: want a mechanism that works with pure-YARN apps
+
+## Possible Solutions:
+
+### ZK
+
+Client applications use ZK to find services (addresses #1, #2 and #3).
+Requires location code in the client.
+
+HBase and Accumulo do this as part of a failover-ready design.
+
+### DNS
+
+Client apps use DNS to find services, with custom DNS server for a 
+subdomain representing YARN services. Addresses #1; with a shortened TTL and 
+no DNS address caching, #3. #2 addressed only if other DNS entries are used to
+ publish service entries. 
+
+Should support existing applications, with a configuration that is stable
+over time. It does require the clients to not cache DNS addresses forever
+(this must be explicitly set on Java applications,
+irrespective of the published TTL). It generates a load on the DNS servers
+that is `O(clients/TTL)`
+
+Google Chubby offers a DNS service to handle this. ZK does not -yet.
+
+### Floating IP Addresses
+
+If the clients know/cache IP addresses of services, these addresses could be
+floated across service instances. Linux HA has floating IP address support,
+while Docker containers can make use of them, especially if an integrated DHCP
+server handles the assignment of IP addresses to specific containers. 
+
+ARP caching is the inevitable problem here, but it is still less brittle than
+relying on applications to know not to cache IP addresses -and nor does it
+place so much on DNS servers as short-TTL DNS entries.
+
+### LDAP
+
+Enterprise Directory services are used to publish/locate services. Requires
+lookup into the directory on binding (#1, #2), re-lookup on failure (#3).
+LDAP permissions can prevent untrusted applications registering.
+
+* Works well with Windows registries.
+
+* Less common Java-side, though possible -and implemented in the core Java
+libraries. Spring-LDAP is focused on connection to an LDAP server
+-not LDAP-driven application config.
+
+### Registration Web Service
+
+ Custom web service registration services used. 
+
+* The sole WS-* one, UDDI, does not have a REST equivalent
+--DNS is assumed to take on that role.
+
+* Requires new client-side code anyway.
+
+### Zookeeper URL Schema
+
+Offer our own `zk://` URL; java & .NET implementations (others?) to resolve, browser plugins. 
+
+* Would address requirements #1 & #3
+
+* Cost: non-standard; needs an extension for every application/platform, and
+will not work with tools such as CURL or web browsers
+
+### AM-side config generation
+
+App-side config generation-YARN applications to generate client-side
+configuration files for launch-time information (#1, #2).
+The AM can dynamically create these, and as the storage load is all in
+the AM, does not consume as much resources in a central server as would 
+publishing it all to that central server.
+
+* Requires application to know of client-side applications to support -
+and be able to generate to their configuration information (i.e. formatted files).
+
+* Requires the AM to get all information from deployed application components
+needed to generate bindings. Unless the AM can resolve YARN App templates,
+need a way to get one of the components in the app to generate settings for
+the entire cluster, and push them back.
+
+* Needs to be repeated for all YARN apps, however deployed.
+
+* Needs something similar for statically deployed applications.
+
+
+### Client-side config generation
+
+YARN app to publish attributes as key-val pairs, client-side code to read and
+generate configs from  (#1, #2).  Example configuration generators could
+create: Hadoop-client XML, Spring, tomcat, guice configs, something for .NET.
+
+* Not limited to Hoya application deployments only.
+
+* K-V pairs need to be published "somewhere". A structured section in the
+ZK tree per app is the obvious location -though potentially expensive. An
+alternative is AM-published data.
+
+* Needs client-side code capable of extracting information from YARN cluster
+to generate client-specific configuration.
+
+* Assumes (key, value) pairs sufficient for client config generation. Again,
+some template expansion will aid here (this time: client-side interpretation).
+
+* Client config generators need to find and bind to the target application themselves.
+
+ 
+
+Multiple options:
+
+* Standard ZK structure for YARN applications (maybe: YARN itself to register
+paths in ZK and set up child permissions,so enforcing security).
+
+* Agents to push to ZK dynamic information as K-V pairs
+
+* Agent provider on AM to fetch K-V pairs and include in status requests
+
+* CLI to fetch app config keys, echo out responses (needs client log4j settings
+to log all slf/log4j to stderr, so that app > results.txt would save results explicitly
+
+*  client side code per app to generate specific binding information
+
+### Load-balancer app Yarn App 
+
+Spread requests round a set of registered handlers, e.g web servers. Support
+plugins for session binding/sharding. 
+
+Some web servers can do this already; a custom YARN app could use grizzy
+embedded. Binding problem exists, but would support scaleable dispatch of values.
+
+*  Could be offered as an AM extension (in provider, ...): scales well
+with #of apps in cluster, but adds initial location/failover problems.
+
+* If offered as a core-YARN service, location is handled via a fixed
+URL. This could place high load on the service, even just 302 redirects.
+

Propchange: incubator/slider/site/trunk/content/design/registry/the_YARN_application_registration_and_binding_problem.md
------------------------------------------------------------------------------
    svn:eol-style = native