You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@slider.apache.org by st...@apache.org on 2014/06/30 17:37:08 UTC
[19/50] [abbrv] SLIDER-121 removed site documentation from git source
http://git-wip-us.apache.org/repos/asf/incubator-slider/blob/209cee43/src/site/markdown/specification/cli-actions.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/specification/cli-actions.md b/src/site/markdown/specification/cli-actions.md
deleted file mode 100644
index 5060ab5..0000000
--- a/src/site/markdown/specification/cli-actions.md
+++ /dev/null
@@ -1,675 +0,0 @@
-<!---
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
--->
-
-# Apache Slider CLI Actions
-
-
-## Important
-
-1. This document is still being updated from the original hoya design
-2. The new cluster model of separated specification files for internal, resource and application configuration
-has not been incorporated.
-1. What is up to date is the CLI command list and arguments
-
-## client configuration
-
-As well as the CLI options, the `conf/slider-client.xml` XML file can define arguments used to communicate with the Application instance
-
-
-#### `fs.defaultFS`
-
-Equivalent to setting the filesystem with `--filesystem`
-
-
-
-## Common
-
-### System Properties
-
-Arguments of the form `-S key=value` define JVM system properties.
-
-These are supported primarily to define options needed for some Kerberos configurations.
-
-### Definitions
-
-Arguments of the form `-D key=value` define JVM system properties.
-
-These can define client options that are not set in `conf/slider-client.xml` - or to override them.
-
-### Cluster names
-
-All actions that must take an instance name will fail with `EXIT_UNKNOWN_INSTANCE`
-if one is not provided.
-
-## Action: Build
-
-Builds a cluster -creates all the on-filesystem datastructures, and generates a cluster description
-that is both well-defined and deployable -*but does not actually start the cluster*
-
- build (instancename,
- options:List[(String,String)],
- components:List[(String, int)],
- componentOptions:List[(String,String, String)],
- resourceOptions:List[(String,String)],
- resourceComponentOptions:List[(String,String, String)],
- confdir: URI,
- provider: String
- zkhosts,
- zkport,
- image
- apphome
- appconfdir
-
-
-#### Preconditions
-
-(Note that the ordering of these preconditions is not guaranteed to remain constant)
-
-The instance name is valid
-
- if not valid-instance-name(instancename) : raise SliderException(EXIT_COMMAND_ARGUMENT_ERROR)
-
-The instance must not be live. This is purely a safety check as the next test should have the same effect.
-
- if slider-instance-live(YARN, instancename) : raise SliderException(EXIT_CLUSTER_IN_USE)
-
-The instance must not exist
-
- if is-dir(HDFS, instance-path(FS, instancename)) : raise SliderException(EXIT_CLUSTER_EXISTS)
-
-The configuration directory must exist it does not have to be the instance's HDFS instance,
-as it will be copied there -and must contain only files
-
- let FS = FileSystem.get(appconfdir)
- if not isDir(FS, appconfdir) raise SliderException(EXIT_COMMAND_ARGUMENT_ERROR)
- forall f in children(FS, appconfdir) :
- if not isFile(f): raise IOException
-
-There's a race condition at build time where between the preconditions being met and the instance specification being saved, the instance
-is created by another process. This addressed by creating a lock file, `writelock.json` in the destination directory. If the file
-exists, no other process may acquire the lock.
-
-There is a less exclusive readlock file, `readlock.json` which may be created by any process that wishes to read the configuration.
-If it exists when another process wishes to access the files, the subsequent process may read the data, but MUST NOT delete it
-afterwards. A process attempting to acquire the writelock must check for the existence of this file before AND after creating the
-writelock file, failing if its present. This retains a small race condition: a second or later reader may still be reading the data
-when a process successfully acquires the write lock. If this proves to be an issue, a stricter model could be implemented, with each reading process creating a unique named readlock- file.
-
-
-
-
-#### Postconditions
-
-All the instance directories exist
-
- is-dir(HDFS', instance-path(HDFS', instancename))
- is-dir(HDFS', original-conf-path(HDFS', instancename))
- is-dir(HDFS', generated-conf-path(HDFS', instancename))
-
-The application cluster specification saved is well-defined and deployable
-
- let instance-description = parse(data(HDFS', instance-json-path(HDFS', instancename)))
- well-defined-instance(instance-description)
- deployable-application-instance(HDFS', instance-description)
-
-More precisely: the specification generated before it is saved as JSON is well-defined and deployable; no JSON file will be created
-if the validation fails.
-
-Fields in the cluster description have been filled in
-
- internal.global["internal.provider.name"] == provider
- app_conf.global["zookeeper.port"] == zkport
- app_conf.global["zookeeper.hosts"] == zkhosts
-
-
- package => app_conf.global["agent.package"] = package
-
-
-
-Any `apphome` and `image` properties have propagated
-
- apphome == null or clusterspec.options["cluster.application.home"] == apphome
- image == null or clusterspec.options["cluster.application.image.path"] == image
-
-(The `well-defined-application-instance()` requirement above defines the valid states
-of this pair of options)
-
-
-All role sizes have been mapped to `component.instances` fields
-
- forall (name, size) in components :
- resources.components[name]["components.instances"] == size
-
-
-
-
-All option parameters have been added to the `options` map in the specification
-
- forall (opt, val) in options :
- app_conf.global[opt] == val
-
- forall (opt, val) in resourceOptions :
- resource.global[opt] == val
-
-All component option parameters have been added to the specific components's option map
-in the relevant configuration file
-
- forall (name, opt, val) in componentOptions :
- app_conf.components[name][opt] == val
-
- forall (name, opt, val) in resourceComponentOptions :
- resourceComponentOptions.components[name][opt] == val
-
-To avoid some confusion as to where keys go, all options beginning with the
-prefix `component.` are automatically copied into the resources file:
-
- forall (opt, val) in options where startswith(opt, "component.")
- or startswith(opt, "role.")
- or startswith(opt, "yarn."):
- resource.global[opt] == val
-
- forall (name, opt, val) in componentOptions where startswith(opt, "component.")
- or startswith(opt, "role.")
- or startswith(opt, "yarn."):
- resourceComponentOptions.components[name][opt] == val
-
-
-There's no explicit rejection of duplicate options, the outcome of that
-state is 'undefined'.
-
-What is defined is that if Slider or its provider provided a default option value,
-the command-line supplied option will override it.
-
-All files that were in the configuration directory are now copied into the "original" configuration directory
-
- let FS = FileSystem.get(appconfdir)
- let dest = original-conf-path(HDFS', instancename)
- forall [c in children(FS, confdir) :
- data(HDFS', dest + [filename(c)]) == data(FS, c)
-
-All files that were in the configuration directory now have equivalents in the generated configuration directory
-
- let FS = FileSystem.get(appconfdir)
- let dest = generated-conf-path(HDFS', instancename)
- forall [c in children(FS, confdir) :
- isfile(HDFS', dest + [filename(c)])
-
-
-## Action: Thaw
-
- thaw <instancename> [--wait <timeout>]
-
-Thaw takes an application instance with configuration and (possibly) data on disk, and
-attempts to create a live application with the specified number of nodes
-
-#### Preconditions
-
- if not valid-instance-name(instancename) : raise SliderException(EXIT_COMMAND_ARGUMENT_ERROR)
-
-The cluster must not be live. This is purely a safety check as the next test should have the same effect.
-
- if slider-instance-live(YARN, instancename) : raise SliderException(EXIT_CLUSTER_IN_USE)
-
-The cluster must not exist
-
- if is-dir(HDFS, application-instance-path(FS, instancename)) : raise SliderException(EXIT_CLUSTER_EXISTS)
-
-The cluster specification must exist, be valid and deployable
-
- if not is-file(HDFS, cluster-json-path(HDFS, instancename)) : SliderException(EXIT_UNKNOWN_INSTANCE)
- if not well-defined-application-instance(HDFS, application-instance-path(HDFS, instancename)) : raise SliderException(EXIT_BAD_CLUSTER_STATE)
- if not deployable-application-instance(HDFS, application-instance-path(HDFS, instancename)) : raise SliderException(EXIT_BAD_CLUSTER_STATE)
-
-### Postconditions
-
-
-After the thaw has been performed, there is now a queued request in YARN
-for the chosen (how?) queue
-
- YARN'.Queues'[amqueue] = YARN.Queues[amqueue] + [launch("slider", instancename, requirements, context)]
-
-If a wait timeout was specified, the cli waits until the application is considered
-running by YARN (the AM is running), the wait timeout has been reached, or
-the application has failed
-
- waittime < 0 or (exists a in slider-running-application-instances(yarn-application-instances(YARN', instancename, user))
- where a.YarnApplicationState == RUNNING)
-
-
-## Outcome: AM-launched state
-
-Some time after the AM was queued, if the relevant
-prerequisites of the launch request are met, the AM will be deployed
-
-#### Preconditions
-
-* The resources referenced in HDFS (still) are accessible by the user
-* The requested YARN memory and core requirements could be met on the YARN cluster and
-specific YARN application queue.
-* There is sufficient capacity in the YARN cluster to create a container for the AM.
-
-#### Postconditions
-
-Define a YARN state at a specific time `t` as `YARN(t)`; the fact that
-an AM is launched afterwards
-
-The AM is deployed if there is some time `t` after the submission time `t0`
-where the application is listed
-
- exists t1 where t1 > t0 and slider-instance-live(YARN(t1), user, instancename)
-
-At which time there is a container in the cluster hosting the AM -it's
-context is the launch context
-
- exists c in containers(YARN(t1)) where container.context = launch.context
-
-There's no way to determine when this time `t1` will be reached -or if it ever
-will -its launch may be postponed due to a lack of resources and/or higher priority
-requests using resources as they become available.
-
-For tests on a dedicated YARN cluster, a few tens of seconds appear to be enough
-for the AM-launched state to be reached, a failure to occur, or to conclude
-that the resource requirements are unsatisfiable.
-
-## Outcome: AM-started state
-
-A (usually short) time after the AM is launched, it should start
-
-* The node hosting the container is working reliably
-* The supplied command line could start the process
-* the localized resources in the context could be copied to the container (which implies
-that they are readable by the user account the AM is running under)
-* The combined classpath of YARN, extra JAR files included in the launch context,
-and the resources in the slider client 'conf' dir contain all necessary dependencies
-to run Slider.
-* There's no issue with the cluster specification that causes the AM to exit
-with an error code.
-
-Node failures/command line failures are treated by YARN as an AM failure which
-will trigger a restart attempt -this may be on the same or a different node.
-
-#### preconditions
-
-The AM was launched at an earlier time, `t1`
-
- exists t1 where t1 > t0 and am-launched(YARN(t1)
-
-
-#### Postconditions
-
-The application is actually started if it is listed in the YARN application list
-as being in the state `RUNNING`, an RPC port has been registered with YARN (visible as the `rpcPort`
-attribute in the YARN Application Report,and that port is servicing RPC requests
-from authenticated callers.
-
- exists t2 where:
- t2 > t1
- and slider-instance-live(YARN(t2), YARN, instancename, user)
- and slider-live-instances(YARN(t2))[0].rpcPort != 0
- and rpc-connection(slider-live-instances(YARN(t2))[0], SliderClusterProtocol)
-
-A test for accepting cluster requests is querying the cluster status
-with `SliderClusterProtocol.getJSONClusterStatus()`. If this returns
-a parseable cluster description, the AM considers itself live.
-
-## Outcome: Applicaton Instance operational state
-
-Once started, Slider enters the operational state of trying to keep the numbers
-of live role instances matching the numbers specified in the cluster specification.
-
-The AM must request the a container for each desired instance of a specific roles of the
-application, wait for those requests to be granted, and then instantiate
-the specific application roles on the allocated containers.
-
-Such a request is made on startup, whenever a failure occurs, or when the
-cluster size is dynamically updated.
-
-The AM releases containers when the cluster size is shrunk during a flex operation,
-or during teardown.
-
-### steady state condition
-
-The steady state of a Slider cluster is that the number of live instances of a role,
-plus the number of requested instances , minus the number of instances for
-which release requests have been made must match that of the desired number.
-
-If the internal state of the Slider AM is defined as `AppState`
-
- forall r in clusterspec.roles :
- r["yarn.component.instances"] ==
- AppState.Roles[r].live + AppState.Roles[r].requested - AppState.Roles[r].released
-
-The `AppState` represents Slider's view of the external YARN system state, based on its
-history of notifications received from YARN.
-
-It is indirectly observable from the cluster state which an AM can be queried for
-
-
- forall r in AM.getJSONClusterStatus().roles :
- r["yarn.component.instances"] ==
- r["role.actual.instances"] + r["role.requested.instances"] - r["role.releasing.instances"]
-
-Slider does not consider it an error if the number of actual instances remains below
-the desired value (i.e. outstanding requests are not being satisfied) -this is
-an operational state of the cluster that Slider cannot address.
-
-### Cluster startup
-
-On a healthy dedicated test cluster, the time for the requests to be satisfied is
-a few tens of seconds at most: a failure to achieve this state is a sign of a problem.
-
-### Node or process failure
-
-After a container or node failure, a new container for a new instance of that role
-is requested.
-
-The failure count is incremented -it can be accessed via the `"role.failed.instances"`
-attribute of a role in the status report.
-
-The number of failures of a role is tracked, and used by Slider as to when to
-conclude that the role is somehow failing consistently -and it should fail the
-entire application.
-
-This has initially been implemented as a simple counter, with the cluster
-option: `"slider.container.failure.threshold"` defining that threshold.
-
- let status = AM.getJSONClusterStatus()
- forall r in in status.roles :
- r["role.failed.instances"] < status.options["slider.container.failure.threshold"]
-
-
-### Instance startup failure
-
-
-Startup failures are measured alongside general node failures.
-
-A container is deemed to have failed to start if either of the following conditions
-were met:
-
-1. The AM received an `onNodeManagerContainerStartFailed` event.
-
-1. The AM received an `onCompletedNode` event on a node that started less than
-a specified number of seconds earlier -a number given in the cluster option
-`"slider.container.failure.shortlife"`.
-
-More sophisticated failure handling logic than is currently implemented may treat
-startup failures differently from ongoing failures -as they can usually be
-treated as a sign that the container is failing to launch the program reliably -
-either the generated command line is invalid, or the application is failing
-to run/exiting on or nearly immediately.
-
-## Action: Create
-
-Create is simply `build` + `thaw` in sequence - the postconditions from the first
-action are intended to match the preconditions of the second.
-
-## Action: Freeze
-
- freeze instancename [--wait time] [--message message]
-
-The *freeze* action "freezes" the cluster: all its nodes running in the YARN
-cluster are stopped, leaving all the persistent state.
-
-The operation is intended to be idempotent: it is not an error if
-freeze is invoked on an already frozen cluster
-
-#### Preconditions
-
-The cluster name is valid and it matches a known cluster
-
- if not valid-instance-name(instancename) : raise SliderException(EXIT_COMMAND_ARGUMENT_ERROR)
-
- if not is-file(HDFS, application-instance-path(HDFS, instancename)) :
- raise SliderException(EXIT_UNKNOWN_INSTANCE)
-
-#### Postconditions
-
-If the cluster was running, an RPC call has been sent to it `stopCluster(message)`
-
-If the `--wait` argument specified a wait time, then the command will block
-until the cluster has finished or the wait time was exceeded.
-
-If the `--message` argument specified a message -it must appear in the
-YARN logs as the reason the cluster was frozen.
-
-
-The outcome should be the same:
-
- not slider-instance-live(YARN', instancename)
-
-## Action: Flex
-
-Flex the cluster size: add or remove roles.
-
- flex instancename
- components:List[(String, int)]
-
-1. The JSON cluster specification in the filesystem is updated
-1. if the cluster is running, it is given the new cluster specification,
-which will change the desired steady-state of the application
-
-#### Preconditions
-
- if not is-file(HDFS, cluster-json-path(HDFS, instancename)) :
- raise SliderException(EXIT_UNKNOWN_INSTANCE)
-
-#### Postconditions
-
- let originalSpec = data(HDFS, cluster-json-path(HDFS, instancename))
-
- let updatedSpec = originalspec where:
- forall (name, size) in components :
- updatedSpec.roles[name]["yarn.component.instances"] == size
- data(HDFS', cluster-json-path(HDFS', instancename)) == updatedSpec
- rpc-connection(slider-live-instances(YARN(t2))[0], SliderClusterProtocol)
- let flexed = rpc-connection(slider-live-instances(YARN(t2))[0], SliderClusterProtocol).flexClusterupdatedSpec)
-
-
-#### AM actions on flex
-
- boolean SliderAppMaster.flexCluster(ClusterDescription updatedSpec)
-
-If the cluster is in a state where flexing is possible (i.e. it is not in teardown),
-then `AppState` is updated with the new desired role counts. The operation will
-return once all requests to add or remove role instances have been queued,
-and be `True` iff the desired steady state of the cluster has been changed.
-
-#### Preconditions
-
- well-defined-application-instance(HDFS, updatedSpec)
-
-
-#### Postconditions
-
- forall role in AppState.Roles.keys:
- AppState'.Roles'[role].desiredCount = updatedSpec[roles]["yarn.component.instances"]
- result = AppState' != AppState
-
-
-The flexing may change the desired steady state of the cluster, in which
-case the relevant requests will have been queued by the completion of the
-action. It is not possible to state whether or when the requests will be
-satisfied.
-
-## Action: Destroy
-
-Idempotent operation to destroy a frozen cluster -it succeeds if the
-cluster has already been destroyed/is unknown, but not if it is
-actually running.
-
-#### Preconditions
-
- if not valid-instance-name(instancename) : raise SliderException(EXIT_COMMAND_ARGUMENT_ERROR)
-
- if slider-instance-live(YARN, instancename) : raise SliderException(EXIT_CLUSTER_IN_USE)
-
-
-#### Postconditions
-
-The cluster directory and all its children do not exist
-
- not is-dir(HDFS', application-instance-path(HDFS', instancename))
-
-
-## Action: Status
-
- status instancename [--out outfile]
- 2
-#### Preconditions
-
- if not slider-instance-live(YARN, instancename) : raise SliderException(EXIT_UNKNOWN_INSTANCE)
-
-#### Postconditions
-
-The status of the application has been successfully queried and printed out:
-
- let status = slider-live-instances(YARN).rpcPort.getJSONClusterStatus()
-
-if the `outfile` value is not defined then the status appears part of stdout
-
- status in STDOUT'
-
-otherwise, the outfile exists in the local filesystem
-
- (outfile != "") ==> data(LocalFS', outfile) == body
- (outfile != "") ==> body in STDOUT'
-
-## Action: Exists
-
-This probes for a named cluster being defined or actually being in the running
-state.
-
-In the running state; it is essentially the status
-operation with only the exit code returned
-
-#### Preconditions
-
-
- if not is-file(HDFS, application-instance-path(HDFS, instancename)) :
- raise SliderException(EXIT_UNKNOWN_INSTANCE)
-
-#### Postconditions
-
-The operation succeeds if the cluster is running and the RPC call returns the cluster
-status.
-
- if live and not slider-instance-live(YARN, instancename):
- retcode = -1
- else:
- retcode = 0
-
-## Action: getConf
-
-This returns the live client configuration of the cluster -the
-site-xml file.
-
- getconf --format (xml|properties) --out [outfile]
-
-*We may want to think hard about whether this is needed*
-
-#### Preconditions
-
- if not slider-instance-live(YARN, instancename) : raise SliderException(EXIT_UNKNOWN_INSTANCE)
-
-
-#### Postconditions
-
-The operation succeeds if the cluster status can be retrieved and saved to
-the named file/printed to stdout in the format chosen
-
- let status = slider-live-instances(YARN).rpcPort.getJSONClusterStatus()
- let conf = status.clientProperties
- if format == "xml" :
- let body = status.clientProperties.asXmlDocument()
- else:
- let body = status.clientProperties.asProperties()
-
- if outfile != "" :
- data(LocalFS', outfile) == body
- else
- body in STDOUT'
-
-## Action: list
-
- list [instancename]
-
-Lists all clusters of a user, or only the one given
-
-#### Preconditions
-
-If a instancename is specified it must be in YARNs list of active or completed applications
-of that user:
-
- if instancename != "" and [] == yarn-application-instances(YARN, instancename, user)
- raise SliderException(EXIT_UNKNOWN_INSTANCE)
-
-
-#### Postconditions
-
-If no instancename was given, all slider applications of that user are listed,
-else only the one running (or one of the finished ones)
-
- if instancename == "" :
- forall a in yarn-application-instances(YARN, user) :
- a.toString() in STDOUT'
- else
- let e = yarn-application-instances(YARN, instancename, user)
- e.toString() in STDOUT'
-
-## Action: killcontainer
-
-This is an operation added for testing. It will kill a container in the cluster
-*without flexing the cluster size*. As a result, the cluster will detect the
-failure and attempt to recover from the failure by instantiating a new instance
-of the cluster
-
- killcontainer cluster --id container-id
-
-#### Preconditions
-
- if not slider-instance-live(YARN, instancename) : raise SliderException(EXIT_UNKNOWN_INSTANCE)
-
- exists c in slider-app-containers(YARN, instancename, user) where c.id == container-id
-
- let status := AM.getJSONClusterStatus()
- exists role = status.instances where container-id in status.instances[role].values
-
-
-#### Postconditions
-
-The container is not in the list of containers in the cluster
-
- not exists c in containers(YARN) where c.id == container-id
-
-And implicitly, not in the running containers of that application
-
- not exists c in slider-app-containers(YARN', instancename, user) where c.id == container-id
-
-At some time `t1 > t`, the status of the application (`AM'`) will be updated to reflect
-that YARN has notified the AM of the loss of the container
-
-
- let status' = AM'.getJSONClusterStatus()
- len(status'.instances[role]) < len(status.instances[role])
- status'.roles[role]["role.failed.instances"] == status'.roles[role]["role.failed.instances"]+1
-
-
-At some time `t2 > t1` in the future, the size of the containers of the application
-in the YARN cluster `YARN''` will be as before
-
- let status'' = AM''.getJSONClusterStatus()
- len(status''.instances[r] == len(status.instances[r])
http://git-wip-us.apache.org/repos/asf/incubator-slider/blob/209cee43/src/site/markdown/specification/index.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/specification/index.md b/src/site/markdown/specification/index.md
deleted file mode 100644
index f4c8d67..0000000
--- a/src/site/markdown/specification/index.md
+++ /dev/null
@@ -1,41 +0,0 @@
-<!---
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
--->
-
-# Specification of Apache Slider behaviour
-
-This is a a "more rigorous" definition of the behavior of Slider in terms
-of its state and its command-line operations -by defining a 'formal' model
-of HDFS, YARN and Slider's internal state, then describing the operations
-that can take place in terms of their preconditions and postconditions.
-
-This is to show what tests we can create to verify that an action
-with a valid set of preconditions results in an outcome whose postconditions
-can be verified. It also makes more apparent what conditions should be
-expected to result in failures, as well as what the failure codes should be.
-
-Specifying the behavior has also helped identify areas where there was ambiguity,
-where clarification and more tests were needed.
-
-The specification depends on ongoing work in [HADOOP-9361](https://issues.apache.org/jira/browse/HADOOP-9361):
-to define the Hadoop Filesytem APIs --This specification uses [the same notation](https://github.com/steveloughran/hadoop-trunk/blob/stevel/HADOOP-9361-filesystem-contract/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/notation.md)
-
-
-1. [Model: YARN And Slider](slider-model.html)
-1. [CLI actions](cli-actions.html)
-
-Exceptions and operations may specify exit codes -these are listed in
-[Client Exit Codes](../exitcodes.html)
http://git-wip-us.apache.org/repos/asf/incubator-slider/blob/209cee43/src/site/markdown/specification/slider-model.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/specification/slider-model.md b/src/site/markdown/specification/slider-model.md
deleted file mode 100644
index 75f8c68..0000000
--- a/src/site/markdown/specification/slider-model.md
+++ /dev/null
@@ -1,286 +0,0 @@
-<!---
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
--->
-
-# Formal Apache Slider Model
-
-This is the model of Slider and YARN for the rest of the specification.
-
-## File System
-
-A File System `HDFS` represents a Hadoop FileSystem -either HDFS or another File
-System which spans the cluster. There are also other filesystems that
-can act as sources of data that is then copied into HDFS. These will be marked
-as `FS` or with the generic `FileSystem` type.
-
-
-There's ongoing work in [HADOOP-9361](https://issues.apache.org/jira/browse/HADOOP-9361)
-to define the Hadoop Filesytem APIs using the same notation as here,
-the latest version being available on [github](https://github.com/steveloughran/hadoop-trunk/tree/stevel/HADOOP-9361-filesystem-contract/hadoop-common-project/hadoop-common/src/site/markdown/filesystem)
-Two key references are
-
- 1. [The notation reused in the Slider specifications](https://github.com/steveloughran/hadoop-trunk/blob/stevel/HADOOP-9361-filesystem-contract/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/notation.md)
- 1. [The model of the filesystem](https://github.com/steveloughran/hadoop-trunk/blob/stevel/HADOOP-9361-filesystem-contract/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/model.md)
-
- The model and its predicates and invariants will be used in these specifications.
-
-## YARN
-
-From the perspective of YARN application, The YARN runtime is a state, `YARN`,
-comprised of: ` (Apps, Queues, Nodes)`
-
- Apps: Map[AppId, ApplicationReport]
-
-An application has a name, an application report and a list of outstanding requests
-
- App: (Name, report: ApplicationReport, Requests:List[AmRequest])
-
-An application report contains a mixture of static and dynamic state of the application
-and the AM.
-
- ApplicationReport: AppId, Type, User, YarnApplicationState, AmContainer, RpcPort, TrackingURL,
-
-YARN applications have a number of states. These are ordered such that if the
-`state.ordinal() > RUNNING.ordinal() ` then the application has entered an exit state.
-
- YarnApplicationState : [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED ]
-
-AMs can request containers to be added or released
-
- AmRequest = { add-container(priority, requirements), release(containerId)}
-
-Job queues are named queues of job requests; there is always a queue called `"default"`
-
- Queues: Map[String:Queue]
- Queue: List[Requests]
- Request = {
- launch(app-name, app-type, requirements, context)
- }
- Context: (localized-resources: Map[String,URL], command)
-
-
-This is doesn't completely model the cluster from the AM perspective -there's no
-notion of node operations (launching code in a container) or events coming from YARN.
-
-The `Nodes` structure models the nodes in a cluster
-
- Nodes: Map[nodeID,(name, containers:List[Container])]
-
-A container contains some state
-
- Container: (containerId, appId, context)
-
-The containers in a cluster are the aggregate set of all containers across
-all nodes
-
- def containers(YARN) =
- [c for n in keys(YARN.Nodes) for c in YARN.Nodes[n].Containers ]
-
-
-The containers of an application are all containers that are considered owned by it,
-
- def app-containers(YARN, appId: AppId) =
- [c in containers(YARN) where c.appId == appId ]
-
-### Operations & predicates used the specifications
-
-
- def applications(YARN, type) =
- [ app.report for app in YARN.Apps.values where app.report.Type == type]
-
- def user-applications(YARN, type, user)
- [a in applications(YARN, type) where: a.User == user]
-
-
-## UserGroupInformation
-
-Applications are launched and executed on hosts computers: either client machines
-or nodes in the cluster, these have their own state which may need modeling
-
- HostState: Map[String, String]
-
-A key part of the host state is actually the identity of the current user,
-which is used to define the location of the persistent state of the cluster -including
-its data, and the identity under which a deployed container executes.
-
-In a secure cluster, this identity is accompanied by kerberos tokens that grant the caller
-access to the filesystem and to parts of YARN itself.
-
-This specification does not currently explicitly model the username and credentials.
-If it did they would be used throughout the specification to bind to a YARN or HDFS instance.
-
-`UserGroupInformation.getCurrentUser(): UserGroupInformation`
-
-Returns the current user information. This information is immutable and fixed for the duration of the process.
-
-
-
-## Slider Model
-
-### Cluster name
-
-A valid cluster name is a name of length > 1 which follows the internet hostname scheme of letter followed by letter or digit
-
- def valid-cluster-name(c) =
- len(c)> 0
- and c[0] in ['a'..'z']
- and c[1] in (['a'..'z'] + ['-'] + ['0..9'])
-
-### Persistent Cluster State
-
-A Slider cluster's persistent state is stored in a path
-
- def cluster-path(FS, clustername) = user-home(FS) + ["clusters", clustername]
- def cluster-json-path(FS, clustername) = cluster-path(FS, clustername) + ["cluster.json"]
- def original-conf-path(FS, clustername) = cluster-path(FS, clustername) + ["original"]
- def generated-conf-path(FS, clustername) = cluster-path(FS, clustername) + ["generated"]
- def data-path(FS, clustername) = cluster-path(FS, clustername) + ["data"]
-
-When a cluster is built/created the specified original configuration directory
-is copied to `original-conf-path(FS, clustername)`; this is patched for the
-specific instance bindings and saved into `generated-conf-path(FS, clustername)`.
-
-A cluster *exists* if all of these paths are found:
-
- def cluster-exists(FS, clustername) =
- is-dir(FS, cluster-path(FS, clustername))
- and is-file(FS, cluster-json-path(FS, clustername))
- and is-dir(FS, original-conf-path(FS, clustername))
- and generated-conf-path(FS, original-conf-path(FS, clustername))
-
-A cluster is considered `running` if there is a Slider application type belonging to the current user in one of the states
-`{NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING}`.
-
- def final-yarn-states = {FINISHED, FAILED, KILLED }
-
- def slider-app-instances(YARN, clustername, user) =
- [a in user-applications(YARN, "slider", user) where:
- and a.Name == clustername]
-
- def slider-app-running-instances(YARN, clustername, user) =
- [a in slider-app-instances(YARN, user, clustername) where:
- not a.YarnApplicationState in final-yarn-state]
-
- def slider-app-running(YARN, clustername, user) =
- [] != slider-app-running-instances(YARN, clustername, user)
-
- def slider-app-live-instances(YARN, clustername, user) =
- [a in slider-app-instances(YARN, user, clustername) where:
- a.YarnApplicationState == RUNNING]
-
- def slider-app-live(YARN, clustername, user) =
- [] != slider-app-live-instances(YARN, clustername, user)
-
-### Invariant: there must never be more than one running instance of a named Slider cluster
-
-
-There must never be more than one instance of the same Slider cluster running:
-
- forall a in user-applications(YARN, "slider", user):
- len(slider-app-running-instances(YARN, a.Name, user)) <= 1
-
-There may be multiple instances in a finished state, and one running instance alongside multiple finished instances -the applications
-that work with Slider MUST select a running cluster ahead of any terminated clusters.
-
-### Containers of an application
-
-
-The containers of a slider application are the set of containers of that application
-
- def slider-app-containers(YARN, clustername, user) =
- app-containers(YARN, appid where
- appid = slider-app-running-instances(YARN, clustername, user)[0])
-
-
-
-
-### RPC Access to a slider cluster
-
-
- An application is accepting RPC requests for a given protocol if there is a port binding
- defined and it is possible to authenticate a connection using the specified protocol
-
- def rpc-connection(appReport, protocol) =
- appReport.host != null
- appReport.rpcPort != 0
- and RPC.getProtocolProxy(appReport.host, appReport.rpcPort, protocol)
-
- Being able to open an RPC port is the strongest definition of liveness possible
- to make: if the AM responds to RPC operations, it is doing useful work.
-
-### Valid Cluster Description
-
-The `cluster.json` file of a cluster configures Slider to deploy the application.
-
-#### well-defined-cluster(cluster-description)
-
-A Cluster Description is well-defined if it is valid JSON and required properties are present
-
-**OBSOLETE**
-
-
-Irrespective of specific details for deploying the Slider AM or any provider-specific role instances,
-a Cluster Description defined in a `cluster.json` file at the path `cluster-json-path(FS, clustername)`
-is well-defined if
-
-1. It is parseable by the jackson JSON parser.
-1. Root elements required of a Slider cluster specification must be defined, and, where appropriate, non-empty
-1. It contains the extensible elements required of a Slider cluster specification. For example, `options` and `roles`
-1. The types of the extensible elements match those expected by Slider.
-1. The `version` element matches a supported version
-1. Exactly one of `options/cluster.application.home` and `options/cluster.application.image.path` must exist.
-1. Any cluster options that are required to be integers must be integers
-
-This specification is very vague here to avoid duplication: the cluster description structure is currently implicitly defined in
-`org.apache.slider.api.ClusterDescription`
-
-Currently Slider ignores unknown elements during parsing. This may be changed.
-
-The test for this state does not refer to the cluster filesystem
-
-#### deployable-cluster(FS, cluster-description)
-
-A Cluster Description defines a deployable cluster if it is well-defined cluster and the contents contain valid information to deploy a cluster
-
-This defines how a cluster description is valid in the extends the valid configuration with
-
-* The entry `name` must match a supported provider
-* Any elements that name the cluster match the cluster name as defined by the path to the cluster:
-
- originConfigurationPath == original-conf-path(FS, clustername)
- generatedConfigurationPath == generated-conf-path(FS, clustername)
- dataPath == data-path(FS, clustername)
-
-* The paths defined in `originConfigurationPath` , `generatedConfigurationPath` and `dataPath` must all exist.
-* `options/zookeeper.path` must be defined and refer to a path in the ZK cluster
-defined by (`options/zookeeper.hosts`, `zookeeper.port)` to which the user has write access (required by HBase and Accumulo)
-* If `options/cluster.application.image.path` is defined, it must exist and be readable by the user.
-* It must declare a type that maps to a provider entry in the Slider client's XML configuration:
-
- len(clusterspec["type"]) > 0
- clientconfig["slider.provider."+ clusterspec["type"]] != null
-
-* That entry must map to a class on the classpath which can be instantiated
-and cast to `SliderProviderFactory`.
-
- let classname = clientconfig["slider.provider."+ clusterspec["type"]]
- (Class.forName(classname).newInstance()) instanceof SliderProviderFactory
-
-#### valid-for-provider(cluster-description, provider)
-
-A provider considers a specification valid if its own validation logic is satisfied. This normally
-consists of rules about the number of instances of different roles; it may include other logic.
-
http://git-wip-us.apache.org/repos/asf/incubator-slider/blob/209cee43/src/site/markdown/troubleshooting.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/troubleshooting.md b/src/site/markdown/troubleshooting.md
deleted file mode 100644
index 42bef8e..0000000
--- a/src/site/markdown/troubleshooting.md
+++ /dev/null
@@ -1,154 +0,0 @@
-<!---
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
--->
-
-# Apache Slider Troubleshooting
-
-Slider can be tricky to start using, because it combines the need to set
-up a YARN application, with the need to have an HBase configuration
-that works
-
-
-### Common problems
-
-## Classpath for Slider AM wrong
-
-The Slider Application Master, the "Slider AM" builds up its classpath from
-those JARs it has locally, and the JARS pre-installed on the classpath
-
-This often surfaces in an exception that can be summarized as
-"hadoop-common.jar is not on the classpath":
-
- Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/ExitUtil$ExitException
- Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.ExitUtil$ExitException
- at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
- at java.security.AccessController.doPrivileged(Native Method)
- at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
- at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
- at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
- at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
- Could not find the main class: org.apache.hadoop.yarn.service.launcher.ServiceLauncher. Program will exit.
-
-
-For ambari-managed deployments, we recommend the following
-
-
- <property>
- <name>yarn.application.classpath</name>
- <value>
- /etc/hadoop/conf,/usr/lib/hadoop/*,/usr/lib/hadoop/lib/*,/usr/lib/hadoop-hdfs/*,/usr/lib/hadoop-hdfs/lib/*,/usr/lib/hadoop-yarn/*,/usr/lib/hadoop-yarn/lib/*,/usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-mapreduce/lib/*
- </value>
- </property>
-
-The `yarn-site.xml` file for the site will contain the relevant value.
-
-### Application Instantiation fails, "TriggerClusterTeardownException: Unstable Cluster"
-
-Slider gives up if it cannot keep enough instances of a role running -or more
-precisely, if they keep failing.
-
-If this happens on cluster startup, it means that the application is not working
-
- org.apache.slider.core.exceptions.TriggerClusterTeardownException: Unstable Cluster:
- - failed with role worker failing 4 times (4 in startup); threshold is 2
- - last failure: Failure container_1386872971874_0001_01_000006 on host 192.168.1.86,
- see http://hor12n22.gq1.ygridcore.net:19888/jobhistory/logs/192.168.1.86:45454/container_1386872971874_0001_01_000006/ctx/yarn
-
-This message warns that a role -here worker- is failing to start and it has failed
-more than the configured failure threshold is. What it doesn't do is say why it failed,
-because that is not something the AM knows -that is a fact hidden in the logs on
-the container that failed.
-
-The final bit of the exception message can help you track down the problem,
-as it points you to the logs.
-
-In the example above the failure was in `container_1386872971874_0001_01_000006`
-on the host `192.168.1.86`. If you go to then node manager on that machine (the YARN
-RM web page will let you do this), and look for that container,
-you may be able to grab the logs from it.
-
-A quicker way is to browse to the URL on the next line.
-Note: the URL depends on yarn.log.server.url being properly configured.
-
-It is from those logs that the cause of the problem -because they are the actual
-output of the actual application which Slider is trying to deploy.
-
-
-
-### Not all the containers start -but whenever you kill one, another one comes up.
-
-This is often caused by YARN not having enough capacity in the cluster to start
-up the requested set of containers. The AM has submitted a list of container
-requests to YARN, but only when an existing container is released or killed
-is one of the outstanding requests granted.
-
-Fix #1: Ask for smaller containers
-
-edit the `yarn.memory` option for roles to be smaller: set it 64 for a smaller
-YARN allocation. *This does not affect the actual heap size of the
-application component deployed*
-
-Fix #2: Tell YARN to be less strict about memory consumption
-
-Here are the properties in `yarn-site.xml` which we set to allow YARN
-to schedule more role instances than it nominally has room for.
-
- <property>
- <name>yarn.scheduler.minimum-allocation-mb</name>
- <value>1</value>
- </property>
- <property>
- <description>Whether physical memory limits will be enforced for
- containers.
- </description>
- <name>yarn.nodemanager.pmem-check-enabled</name>
- <value>false</value>
- </property>
- <!-- we really don't want checking here-->
- <property>
- <name>yarn.nodemanager.vmem-check-enabled</name>
- <value>false</value>
- </property>
-
-If you create too many instances, your hosts will start swapping and
-performance will collapse -we do not recommend using this in production.
-
-
-### Configuring YARN for better debugging
-
-
-One configuration to aid debugging is tell the nodemanagers to
-keep data for a short period after containers finish
-
- <!-- 10 minutes after a failure to see what is left in the directory-->
- <property>
- <name>yarn.nodemanager.delete.debug-delay-sec</name>
- <value>600</value>
- </property>
-
-You can then retrieve logs by either the web UI, or by connecting to the
-server (usually by `ssh`) and retrieve the logs from the log directory
-
-
-We also recommend making sure that YARN kills processes
-
- <!--time before the process gets a -9 -->
- <property>
- <name>yarn.nodemanager.sleep-delay-before-sigkill.ms</name>
- <value>30000</value>
- </property>
-
-
http://git-wip-us.apache.org/repos/asf/incubator-slider/blob/209cee43/src/site/resources/hoya_am_architecture.png
----------------------------------------------------------------------
diff --git a/src/site/resources/hoya_am_architecture.png b/src/site/resources/hoya_am_architecture.png
deleted file mode 100644
index 191a8db..0000000
Binary files a/src/site/resources/hoya_am_architecture.png and /dev/null differ
http://git-wip-us.apache.org/repos/asf/incubator-slider/blob/209cee43/src/site/resources/images/app_config_folders_01.png
----------------------------------------------------------------------
diff --git a/src/site/resources/images/app_config_folders_01.png b/src/site/resources/images/app_config_folders_01.png
deleted file mode 100644
index 4e78b63..0000000
Binary files a/src/site/resources/images/app_config_folders_01.png and /dev/null differ
http://git-wip-us.apache.org/repos/asf/incubator-slider/blob/209cee43/src/site/resources/images/app_package_sample_04.png
----------------------------------------------------------------------
diff --git a/src/site/resources/images/app_package_sample_04.png b/src/site/resources/images/app_package_sample_04.png
deleted file mode 100644
index 170256b..0000000
Binary files a/src/site/resources/images/app_package_sample_04.png and /dev/null differ
http://git-wip-us.apache.org/repos/asf/incubator-slider/blob/209cee43/src/site/resources/images/image_0.png
----------------------------------------------------------------------
diff --git a/src/site/resources/images/image_0.png b/src/site/resources/images/image_0.png
deleted file mode 100644
index e62a3e7..0000000
Binary files a/src/site/resources/images/image_0.png and /dev/null differ
http://git-wip-us.apache.org/repos/asf/incubator-slider/blob/209cee43/src/site/resources/images/image_1.png
----------------------------------------------------------------------
diff --git a/src/site/resources/images/image_1.png b/src/site/resources/images/image_1.png
deleted file mode 100644
index d0888ac..0000000
Binary files a/src/site/resources/images/image_1.png and /dev/null differ
http://git-wip-us.apache.org/repos/asf/incubator-slider/blob/209cee43/src/site/resources/images/managed_client.png
----------------------------------------------------------------------
diff --git a/src/site/resources/images/managed_client.png b/src/site/resources/images/managed_client.png
deleted file mode 100644
index 9c094b1..0000000
Binary files a/src/site/resources/images/managed_client.png and /dev/null differ
http://git-wip-us.apache.org/repos/asf/incubator-slider/blob/209cee43/src/site/resources/images/slider-container.png
----------------------------------------------------------------------
diff --git a/src/site/resources/images/slider-container.png b/src/site/resources/images/slider-container.png
deleted file mode 100644
index 2e02833..0000000
Binary files a/src/site/resources/images/slider-container.png and /dev/null differ
http://git-wip-us.apache.org/repos/asf/incubator-slider/blob/209cee43/src/site/resources/images/unmanaged_client.png
----------------------------------------------------------------------
diff --git a/src/site/resources/images/unmanaged_client.png b/src/site/resources/images/unmanaged_client.png
deleted file mode 100644
index 739d56d..0000000
Binary files a/src/site/resources/images/unmanaged_client.png and /dev/null differ
http://git-wip-us.apache.org/repos/asf/incubator-slider/blob/209cee43/src/site/site.xml
----------------------------------------------------------------------
diff --git a/src/site/site.xml b/src/site/site.xml
deleted file mode 100644
index 12dc5cf..0000000
--- a/src/site/site.xml
+++ /dev/null
@@ -1,84 +0,0 @@
-<?xml version="1.0"?>
-<!--
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
--->
-<project name="Apache Slider ${project.version} (incubating)">
-<!--
-
- <skin>
- <groupId>org.apache.maven.skins</groupId>
- <artifactId>maven-stylus-skin</artifactId>
- <version>1.2</version>
- </skin>
-
- <skin>
- <groupId>org.apache.maven.skins</groupId>
- <artifactId>maven-application-skin</artifactId>
- <version>1.0</version>
- </skin>
-
--->
-
- <skin>
- <groupId>org.apache.maven.skins</groupId>
- <artifactId>maven-fluido-skin</artifactId>
- <version>1.3.0</version>
- </skin>
-
- <custom>
- <fluidoSkin>
- <topBarEnabled>true</topBarEnabled>
- <sideBarEnabled>false</sideBarEnabled>
- </fluidoSkin>
- </custom>
-
- <version position="right"/>
-
- <bannerLeft>
- <name>Apache Slider (incubating)</name>
- <href>http://slider.incubator.apache.org</href>
- </bannerLeft>
-
- <bannerRight>
- <src>http://incubator.apache.org/images/apache-incubator-logo.png</src>
- </bannerRight>
-
- <body>
-
- <menu ref="reports"/>
-
- <menu name="Documents">
- <item name="Getting Started" href="/getting_started.html"/>
- <item name="manpage" href="/manpage.html"/>
- <item name="Troubleshooting" href="/troubleshooting.html"/>
- <item name="Architecture" href="/architecture/index.html"/>
- <item name="Developing" href="/developing/index.html"/>
- <item name="Exitcodes" href="/exitcodes.html"/>
- </menu>
-
- <menu name="ASF">
- <item name="How Apache Works" href="http://www.apache.org/foundation/how-it-works.html"/>
- <item name="Developer Documentation" href="http://www.apache.org/dev/"/>
- <item name="Foundation" href="http://www.apache.org/foundation/"/>
- <item name="Sponsor Apache" href="http://www.apache.org/foundation/sponsorship.html"/>
- <item name="Thanks" href="http://www.apache.org/foundation/thanks.html"/>
- </menu>
-
- <footer>
- <div class="row-fluid">Apache Slider, Slider, Apache, and the Apache Incubator logo are trademarks of The Apache Software Foundation.</div>
- </footer>
- </body>
-</project>