You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@aurora.apache.org by re...@apache.org on 2018/04/03 23:45:43 UTC

svn commit: r1828293 [9/9] - in /aurora/site: data/ publish/ publish/blog/ publish/documentation/0.10.0/ publish/documentation/0.10.0/build-system/ publish/documentation/0.10.0/client-cluster-configuration/ publish/documentation/0.10.0/client-commands/...

Modified: aurora/site/source/documentation/latest/development/db-migration.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/db-migration.md?rev=1828293&r1=1828292&r2=1828293&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/development/db-migration.md (original)
+++ aurora/site/source/documentation/latest/development/db-migration.md Tue Apr  3 23:45:31 2018
@@ -14,7 +14,7 @@ When adding or altering tables or changi
 [schema.sql](../../src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql), a new
 migration class should be created under the org.apache.aurora.scheduler.storage.db.migration
 package. The class should implement the [MigrationScript](https://github.com/mybatis/migrations/blob/master/src/main/java/org/apache/ibatis/migration/MigrationScript.java)
-interface (see [V001_TestMigration](https://github.com/apache/aurora/blob/rel/0.19.1/src/test/java/org/apache/aurora/scheduler/storage/db/testmigration/V001_TestMigration.java)
+interface (see [V001_TestMigration](https://github.com/apache/aurora/blob/rel/0.20.0/src/test/java/org/apache/aurora/scheduler/storage/db/testmigration/V001_TestMigration.java)
 as an example). The upgrade and downgrade scripts are defined in this class. When restoring a
 snapshot the list of migrations on the classpath is compared to the list of applied changes in the
 DB. Any changes that have not yet been applied are executed and their downgrade script is stored

Modified: aurora/site/source/documentation/latest/development/thrift.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/thrift.md?rev=1828293&r1=1828292&r2=1828293&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/development/thrift.md (original)
+++ aurora/site/source/documentation/latest/development/thrift.md Tue Apr  3 23:45:31 2018
@@ -6,7 +6,7 @@ client/server RPC protocol as well as fo
 correctly handling additions and renames of the existing members, field removals must be done
 carefully to ensure backwards compatibility and provide predictable deprecation cycle. This
 document describes general guidelines for making Thrift schema changes to the existing fields in
-[api.thrift](https://github.com/apache/aurora/blob/rel/0.19.1/api/src/main/thrift/org/apache/aurora/gen/api.thrift).
+[api.thrift](https://github.com/apache/aurora/blob/rel/0.20.0/api/src/main/thrift/org/apache/aurora/gen/api.thrift).
 
 It is highly recommended to go through the
 [Thrift: The Missing Guide](http://diwakergupta.github.io/thrift-missing-guide/) first to refresh on
@@ -33,7 +33,7 @@ communicate with scheduler/client from v
 * Add a new field as an eventual replacement of the old one and implement a dual read/write
 anywhere the old field is used. If a thrift struct is mapped in the DB store make sure both columns
 are marked as `NOT NULL`
-* Check [storage.thrift](https://github.com/apache/aurora/blob/rel/0.19.1/api/src/main/thrift/org/apache/aurora/gen/storage.thrift) to see if
+* Check [storage.thrift](https://github.com/apache/aurora/blob/rel/0.20.0/api/src/main/thrift/org/apache/aurora/gen/storage.thrift) to see if
 the affected struct is stored in Aurora scheduler storage. If so, it's almost certainly also
 necessary to perform a [DB migration](../db-migration/).
 * Add a deprecation jira ticket into the vCurrent+1 release candidate

Modified: aurora/site/source/documentation/latest/features/custom-executors.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/custom-executors.md?rev=1828293&r1=1828292&r2=1828293&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/features/custom-executors.md (original)
+++ aurora/site/source/documentation/latest/features/custom-executors.md Tue Apr  3 23:45:31 2018
@@ -145,9 +145,22 @@ Some information about launched tasks ca
 
 ### Using a custom executor
 
-At this time, it is not currently possible create a job that runs on a custom executor using the default
-Aurora client. To allow the scheduler to pick the correct executor, the `JobConfiguration.TaskConfig.ExecutorConfig.name`
-field must be set to match the name used in the custom executor configuration blob. (e.g. to run a job using myExecutor,
-`JobConfiguration.TaskConfig.ExecutorConfig.name` must be set to `myExecutor`). While support for modifying
-this field in Pystachio created, the easiest way to launch jobs with custom executors is to use
-an existing custom Client such as [gorealis](https://github.com/rdelval/gorealis).
+To launch tasks using a custom executor,
+an [ExecutorConfig](../../reference/configuration/#executorconfig-objects) object must be added to
+the Job or Service object. The `name` parameter of ExecutorConfig must match the name of an executor
+defined in the JSON object provided to the scheduler at startup time.
+
+For example, if we desire to launch tasks using `myExecutor` (defined above), we may do so in
+the following manner:
+
+```
+jobs = [Service(
+  task = task,
+  cluster = 'devcluster',
+  role = 'www-data',
+  environment = 'prod',
+  name = 'hello',
+  executor_config = ExecutorConfig(name='myExecutor'))]
+```
+
+This will create a Service Job which will launch tasks using myExecutor instead of Thermos.

Modified: aurora/site/source/documentation/latest/features/job-updates.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/job-updates.md?rev=1828293&r1=1828292&r2=1828293&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/features/job-updates.md (original)
+++ aurora/site/source/documentation/latest/features/job-updates.md Tue Apr  3 23:45:31 2018
@@ -70,7 +70,7 @@ acknowledging ("heartbeating") job updat
 service updates where explicit job health monitoring is vital during the entire job update
 lifecycle. Such job updates would rely on an external service (or a custom client) periodically
 pulsing an active coordinated job update via a
-[pulseJobUpdate RPC](https://github.com/apache/aurora/blob/rel/0.19.1/api/src/main/thrift/org/apache/aurora/gen/api.thrift).
+[pulseJobUpdate RPC](https://github.com/apache/aurora/blob/rel/0.20.0/api/src/main/thrift/org/apache/aurora/gen/api.thrift).
 
 A coordinated update is defined by setting a positive
 [pulse_interval_secs](../../reference/configuration/#updateconfig-objects) value in job configuration

Modified: aurora/site/source/documentation/latest/features/service-discovery.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/service-discovery.md?rev=1828293&r1=1828292&r2=1828293&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/features/service-discovery.md (original)
+++ aurora/site/source/documentation/latest/features/service-discovery.md Tue Apr  3 23:45:31 2018
@@ -33,7 +33,8 @@ least the following:
  `_http_example.test.vagrant._tcp.aurora.mesos`, which includes IP address and every port. This should only
   be used if the service has one port.
 3. A SRV record `_{port-name}._http_example.test.vagrant._tcp.aurora.mesos` for each port name
-  defined. This should be used when the service has multiple ports.
+  defined. This should be used when the service has multiple ports. To have this working properly it's needed to
+  add `-populate_discovery_info` to scheduler's configuration.
 
 Things to note:
 

Modified: aurora/site/source/documentation/latest/features/sla-metrics.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/sla-metrics.md?rev=1828293&r1=1828292&r2=1828293&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/features/sla-metrics.md (original)
+++ aurora/site/source/documentation/latest/features/sla-metrics.md Tue Apr  3 23:45:31 2018
@@ -63,7 +63,7 @@ relevant to uptime calculations. By appl
 transition records, we can build a deterministic downtime trace for every given service instance.
 
 A task going through a state transition carries one of three possible SLA meanings
-(see [SlaAlgorithm.java](https://github.com/apache/aurora/blob/rel/0.19.1/src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java) for
+(see [SlaAlgorithm.java](https://github.com/apache/aurora/blob/rel/0.20.0/src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java) for
 sla-to-task-state mapping):
 
 * Task is UP: starts a period where the task is considered to be up and running from the Aurora
@@ -110,7 +110,7 @@ metric that helps track the dependency o
 * Per job - `sla_<job_key>_mtta_ms`
 * Per cluster - `sla_cluster_mtta_ms`
 * Per instance size (small, medium, large, x-large, xx-large). Size are defined in:
-[ResourceBag.java](https://github.com/apache/aurora/blob/rel/0.19.1/src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java)
+[ResourceBag.java](https://github.com/apache/aurora/blob/rel/0.20.0/src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java)
   * By CPU:
     * `sla_cpu_small_mtta_ms`
     * `sla_cpu_medium_mtta_ms`
@@ -147,7 +147,7 @@ for a task.*
 * Per job - `sla_<job_key>_mtts_ms`
 * Per cluster - `sla_cluster_mtts_ms`
 * Per instance size (small, medium, large, x-large, xx-large). Size are defined in:
-[ResourceBag.java](https://github.com/apache/aurora/blob/rel/0.19.1/src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java)
+[ResourceBag.java](https://github.com/apache/aurora/blob/rel/0.20.0/src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java)
   * By CPU:
     * `sla_cpu_small_mtts_ms`
     * `sla_cpu_medium_mtts_ms`
@@ -182,7 +182,7 @@ reflecting on the overall time it takes
 * Per job - `sla_<job_key>_mttr_ms`
 * Per cluster - `sla_cluster_mttr_ms`
 * Per instance size (small, medium, large, x-large, xx-large). Size are defined in:
-[ResourceBag.java](https://github.com/apache/aurora/blob/rel/0.19.1/src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java)
+[ResourceBag.java](https://github.com/apache/aurora/blob/rel/0.20.0/src/main/java/org/apache/aurora/scheduler/resources/ResourceBag.java)
   * By CPU:
     * `sla_cpu_small_mttr_ms`
     * `sla_cpu_medium_mttr_ms`

Modified: aurora/site/source/documentation/latest/getting-started/vagrant.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/getting-started/vagrant.md?rev=1828293&r1=1828292&r2=1828293&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/getting-started/vagrant.md (original)
+++ aurora/site/source/documentation/latest/getting-started/vagrant.md Tue Apr  3 23:45:31 2018
@@ -148,7 +148,7 @@ Most of the Vagrant related problems can
 
 If that still doesn't solve your problem, make sure to inspect the log files:
 
-* Scheduler: `/var/log/upstart/aurora-scheduler.log`
-* Observer: `/var/log/upstart/aurora-thermos-observer.log`
+* Scheduler: `/var/log/aurora/scheduler.log` or `sudo journalctl -u aurora-scheduler`
+* Observer: `/var/log/thermos/observer.log` or `sudo journalctl -u thermos-observer`
 * Mesos Master: `/var/log/mesos/mesos-master.INFO` (also see `.WARNING` and `.ERROR`)
 * Mesos Agent: `/var/log/mesos/mesos-slave.INFO` (also see `.WARNING` and `.ERROR`)

Modified: aurora/site/source/documentation/latest/operations/configuration.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/operations/configuration.md?rev=1828293&r1=1828292&r2=1828293&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/operations/configuration.md (original)
+++ aurora/site/source/documentation/latest/operations/configuration.md Tue Apr  3 23:45:31 2018
@@ -104,7 +104,7 @@ can furthermore help with storage perfor
 ### `-native_log_zk_group_path`
 ZooKeeper path used for Mesos replicated log quorum discovery.
 
-See [code](https://github.com/apache/aurora/blob/rel/0.19.1/src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java) for
+See [code](https://github.com/apache/aurora/blob/rel/0.20.0/src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java) for
 other available Mesos replicated log configuration options and default values.
 
 ### Changing the Quorum Size
@@ -167,7 +167,7 @@ the latter needs to be enabled via:
 
     -enable_revocable_ram=true
 
-Unless you want to use the [default](https://github.com/apache/aurora/blob/rel/0.19.1/src/main/resources/org/apache/aurora/scheduler/tiers.json)
+Unless you want to use the [default](https://github.com/apache/aurora/blob/rel/0.20.0/src/main/resources/org/apache/aurora/scheduler/tiers.json)
 tier configuration, you will also have to specify a file path:
 
     -tier_config=path/to/tiers/config.json

Modified: aurora/site/source/documentation/latest/reference/configuration.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/configuration.md?rev=1828293&r1=1828292&r2=1828293&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/reference/configuration.md (original)
+++ aurora/site/source/documentation/latest/reference/configuration.md Tue Apr  3 23:45:31 2018
@@ -356,6 +356,9 @@ Job Schema
   ```tier``` | String | Task tier type. The default scheduler tier configuration allows for 3 tiers: `revocable`, `preemptible`, and `preferred`. If a tier is not elected, Aurora assigns the task to a tier based on its choice of `production` (that is `preferred` for production and `preemptible` for non-production jobs). See the section on [Configuration Tiers](../../features/multitenancy/#configuration-tiers) for more information.
   ```announce``` | ```Announcer``` object | Optionally enable Zookeeper ServerSet announcements. See [Announcer Objects] for more information.
   ```enable_hooks``` | Boolean | Whether to enable [Client Hooks](../client-hooks/) for this job. (Default: False)
+  ```partition_policy``` | ```PartitionPolicy``` object | An optional partition policy that allows job owners to define how to handle partitions for running tasks (in partition-aware Aurora clusters)
+  ```metadata``` | list of ```Metadata``` objects | list of ```Metadata``` objects for user's customized metadata information.
+  ```executor_config``` | ```ExecutorConfig``` object | Allows choosing an alternative executor defined in `custom_executor_config` to be used instead of Thermos. Tasks will be launched with Thermos as the executor by default. See [Custom Executors](../../features/custom-executors/) for more info.
 
 
 ### UpdateConfig Objects
@@ -403,6 +406,29 @@ Parameters for controlling a task's heal
 | -------                        | :-------: | --------
 | ```shell_command```            | String    | An alternative to HTTP health checking. Specifies a shell command that will be executed. Any non-zero exit status will be interpreted as a health check failure.
 
+### PartitionPolicy Objects
+| param                          | type      | description
+| -------                        | :-------: | --------
+| ```reschedule```               | Boolean   | Whether or not to reschedule when running tasks become partitioned (Default: True)
+| ```delay_secs```               | Integer   | How long to delay transitioning to LOST when running tasks are partitioned. (Default: 0)
+
+### Metadata Objects
+
+Describes a piece of user metadata in a key value pair
+
+  param            | type            | description
+  -----            | :----:          | -----------
+  ```key```        | String          | Indicate which metadata the user provides
+  ```value```      | String          | Provide the metadata content for corresponding key
+
+### ExecutorConfig Objects
+
+Describes an Executor name and data to pass to the Mesos Task
+
+| param                          | type      | description
+| -------                        | :-------: | --------
+| ```name```               | String   | Name of the executor to use for this task. Must match the name of an executor in `custom_executor_config` or Thermos (`AuroraExecutor`). (Default: AuroraExecutor)
+| ```data```               | String   | Data blob to pass on to the executor. (Default: "")
 
 ### Announcer Objects
 

Modified: aurora/site/source/documentation/latest/reference/observer-configuration.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/observer-configuration.md?rev=1828293&r1=1828292&r2=1828293&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/reference/observer-configuration.md (original)
+++ aurora/site/source/documentation/latest/reference/observer-configuration.md Tue Apr  3 23:45:31 2018
@@ -29,6 +29,20 @@ Options:
   --task_disk_collection_interval_secs=TASK_DISK_COLLECTION_INTERVAL_SECS
                         The number of seconds between per task disk resource
                         collections. [default: 60]
+  --enable_mesos_disk_collector
+                        Delegate per task disk usage collection to agent.
+                        Should be enabled in conjunction to disk isolation in
+                        Mesos-agent. This is not compatible with an
+                        authenticated agent API. [default: False]
+  --agent_api_url=AGENT_API_URL
+                        Mesos Agent API url. [default:
+                        http://localhost:5051/containers]
+  --executor_id_json_path=EXECUTOR_ID_JSON_PATH
+                        `jmespath` to executor_id key in agent response json
+                        object. [default: executor_id]
+  --disk_usage_json_path=DISK_USAGE_JSON_PATH
+                        `jmespath` to disk usage bytes value in agent response
+                        json object. [default: statistics.disk_used_bytes]
 
   From module twitter.common.app:
     --app_daemonize     Daemonize this application. [default: False]

Modified: aurora/site/source/documentation/latest/reference/scheduler-configuration.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/scheduler-configuration.md?rev=1828293&r1=1828292&r2=1828293&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/reference/scheduler-configuration.md (original)
+++ aurora/site/source/documentation/latest/reference/scheduler-configuration.md Tue Apr  3 23:45:31 2018
@@ -176,6 +176,10 @@ Optional flags:
 	Maximum amount of random jitter to add to the offer hold time window.
 -offer_reservation_duration (default (3, mins))
 	Time to reserve a agent's offers while trying to satisfy a task preempting another.
+-offer_set_module (default [class org.apache.aurora.scheduler.offers.OfferSetModule])
+  Guice module for replacing offer holding and scheduling logic.
+-partition_aware (default false)
+  Whether or not to integrate with the partition-aware Mesos capabilities.
 -populate_discovery_info (default false)
 	If true, Aurora populates DiscoveryInfo field of Mesos TaskInfo.
 -preemption_delay (default (3, mins))
@@ -220,7 +224,7 @@ Optional flags:
 	Time for a stat to be retained in memory before expiring.
 -stat_sampling_interval (default (1, secs))
 	Statistic value sampling interval.
--task_assigner_modules (default [class org.apache.aurora.scheduler.state.FirstFitTaskAssignerModule])
+-task_assigner_modules (default [class org.apache.aurora.scheduler.scheduling.TaskAssignerImplModule])
   Guice modules for replacing task assignment logic.
 -thermos_executor_cpu (default 0.25)
 	The number of CPU cores to allocate for each instance of the executor.

Modified: aurora/site/source/documentation/latest/reference/task-lifecycle.md
URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/task-lifecycle.md?rev=1828293&r1=1828292&r2=1828293&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/reference/task-lifecycle.md (original)
+++ aurora/site/source/documentation/latest/reference/task-lifecycle.md Tue Apr  3 23:45:31 2018
@@ -105,6 +105,31 @@ agent go into `LOST` state and new `Task
 From `PENDING` state, there is no guarantee a `Task` will be reassigned
 to the same machine unless job constraints explicitly force it there.
 
+
+## RUNNING to PARTITIONED states
+If Aurora is configured to enable partition awareness, a task which is in a
+running state can transition to `PARTITIONED`. This happens when the state
+of the task in Mesos becomes unknown. By default Aurora errs on the side of
+availability, so all tasks that transition to `PARTITIONED` are immediately
+transitioned to `LOST`.
+
+This policy is not ideal for all types of workloads you may wish to run in
+your Aurora cluster, e.g. for jobs where task failures are very expensive.
+So job owners may set their own `PartitionPolicy` where they can control
+how long to remain in `PARTITIONED` before transitioning to `LOST`. Or they
+can disable any automatic attempts to `reschedule` when in `PARTITIONED`,
+effectively waiting out the partition for as long as possible.
+
+
+## PARTITIONED and transient states
+The `PartitionPolicy` provided by users only applies to tasks which are
+currently running. When tasks are moving in and out of transient states,
+e.g. tasks being updated, restarted, preempted, etc., `PARTITIONED` tasks
+are moved immediately to `LOST`. This prevents situations where system
+or user-initiated actions are blocked indefinitely waiting for partitions
+to resolve (that may never be resolved).
+
+
 ### Giving Priority to Production Tasks: PREEMPTING
 
 Sometimes a Task needs to be interrupted, such as when a non-production