You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by ja...@apache.org on 2018/09/28 21:26:05 UTC

samza git commit: SAMZA-1925: Document frequently used configs

Repository: samza
Updated Branches:
  refs/heads/master d88a3cd5f -> 2c4b6d5c9


SAMZA-1925: Document frequently used configs

prateekm vjagadish
Please let me know if there are configs that should/should not be there or there are descriptions that you would like me to change.

Screenshots:
![screen shot 2018-09-27 at 4 17 01 pm](https://user-images.githubusercontent.com/29577458/46179676-2f922500-c271-11e8-8d77-c43b33e9b0f3.png)
<img width="898" alt="screen shot 2018-09-27 at 4 17 22 pm" src="https://user-images.githubusercontent.com/29577458/46179677-2f922500-c271-11e8-95a2-b7117c392aa4.png">
![screen shot 2018-09-27 at 4 17 39 pm](https://user-images.githubusercontent.com/29577458/46179678-2f922500-c271-11e8-9a9b-4b09bf16e71d.png)
![screen shot 2018-09-27 at 4 17 54 pm](https://user-images.githubusercontent.com/29577458/46179679-2f922500-c271-11e8-9080-6553b3128b42.png)
![screen shot 2018-09-27 at 4 18 20 pm](https://user-images.githubusercontent.com/29577458/46179680-2f922500-c271-11e8-80cf-f9fb5d4e15b7.png)
![screen shot 2018-09-27 at 4 18 35 pm](https://user-images.githubusercontent.com/29577458/46179682-302abb80-c271-11e8-9a63-f7c6a7de16a6.png)
![screen shot 2018-09-27 at 4 18 45 pm](https://user-images.githubusercontent.com/29577458/46179683-302abb80-c271-11e8-96a2-eacec302a91f.png)
<img width="850" alt="screen shot 2018-09-27 at 4 19 02 pm" src="https://user-images.githubusercontent.com/29577458/46179684-302abb80-c271-11e8-822f-5142e9faf494.png">
![screen shot 2018-09-27 at 4 19 12 pm](https://user-images.githubusercontent.com/29577458/46179685-302abb80-c271-11e8-82a3-73d185cefab0.png)
![screen shot 2018-09-27 at 4 19 20 pm](https://user-images.githubusercontent.com/29577458/46179686-302abb80-c271-11e8-8a2c-b669498a75cc.png)
![screen shot 2018-09-27 at 4 19 29 pm](https://user-images.githubusercontent.com/29577458/46179687-30c35200-c271-11e8-8f92-f463b49b2293.png)
<img width="795" alt="screen shot 2018-09-27 at 4 19 39 pm" src="https://user-images.githubusercontent.com/29577458/46179688-30c35200-c271-11e8-810d-d014f63fb02d.png">

Author: Daniel Chen <dc...@linkedin.com>

Reviewers: Jagadish<ja...@apache.org>

Closes #671 from dxichen/config-doc-revamp


Project: http://git-wip-us.apache.org/repos/asf/samza/repo
Commit: http://git-wip-us.apache.org/repos/asf/samza/commit/2c4b6d5c
Tree: http://git-wip-us.apache.org/repos/asf/samza/tree/2c4b6d5c
Diff: http://git-wip-us.apache.org/repos/asf/samza/diff/2c4b6d5c

Branch: refs/heads/master
Commit: 2c4b6d5c969772b8c84f1b99e73752843a0dab6c
Parents: d88a3cd
Author: Daniel Chen <dc...@linkedin.com>
Authored: Fri Sep 28 14:26:02 2018 -0700
Committer: Jagadish <jv...@linkedin.com>
Committed: Fri Sep 28 14:26:02 2018 -0700

----------------------------------------------------------------------
 docs/css/main.new.css                           |  58 +++----
 .../versioned/jobs/basic-configurations.md      | 168 +++++++++++++++++++
 .../versioned/jobs/configuration.md             |   2 +
 3 files changed, 200 insertions(+), 28 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/samza/blob/2c4b6d5c/docs/css/main.new.css
----------------------------------------------------------------------
diff --git a/docs/css/main.new.css b/docs/css/main.new.css
index 5b2a1e5..cd88e37 100644
--- a/docs/css/main.new.css
+++ b/docs/css/main.new.css
@@ -19,23 +19,23 @@
 
 /********************************************************************
  *
- * REFERENCE 
+ * REFERENCE
  *
  *******************************************************************/
- 
- /* 
- 
+
+ /*
+
  font-family: 'Barlow', sans-serif;
- 
+
  logo color red: #eb1c23
  logo color grey: #4f4f4f
  apache logo feather purple: #7a2c68
- 
+
  */
 
 /********************************************************************
  *
- * GLOBAL 
+ * GLOBAL
  *
  *******************************************************************/
 
@@ -142,7 +142,7 @@ a.side-navigation__group-title:hover::after {
 
 /********************************************************************
  *
- * NAVIGATION - MAIN 
+ * NAVIGATION - MAIN
  *
  *******************************************************************/
 
@@ -320,7 +320,7 @@ footer p {
 }
 
 .footer__items {
-  
+
 }
 
 .footer__item {
@@ -339,7 +339,7 @@ footer p {
   left: -30px;
 }
 
-footer .side-by-side { 
+footer .side-by-side {
   flex-direction: column;
 }
 
@@ -364,7 +364,7 @@ footer .side-by-side > * {
 
 /********************************************************************
  *
- * SECTION - HOME PAGE 
+ * SECTION - HOME PAGE
  *
  *******************************************************************/
 
@@ -396,7 +396,7 @@ footer .side-by-side > * {
   z-index: 20;
 }
 
-.section--highlight.section--bottom-flare::after { 
+.section--highlight.section--bottom-flare::after {
   border-top-color: #111;
 }
 
@@ -805,7 +805,7 @@ a.side-navigation__group-title::after {
 /****
  *
  * Markdown stuff
- * 
+ *
  ****/
 
 .page > .container {
@@ -828,18 +828,20 @@ a.side-navigation__group-title::after {
 table {
   border-collapse: collapse;
   margin: 1em 0;
-  font-size: 15px;
+  font-size : 12px;
+  font-family : "Myriad Web",Verdana,Helvetica,Arial,sans-serif;
 }
 
 table th, table td {
   text-align: left;
   vertical-align: top;
-  padding: 12px;
+  padding: 5px;
   border-bottom: 1px solid #ccc;
   border-top: 1px solid #ccc;
-  border-left: 0;
-  border-right: 0;
+  border-left: 1px solid #ccc;
+  border-right: 1px solid #ccc;
 }
+
 pre {
   padding: 20px;
   font-size: 15px;
@@ -903,7 +905,7 @@ figure, .page .content div.highlight {
 /****
  *
  * Releases List on Docs
- * 
+ *
  ****/
 
 .releases-list-divider {
@@ -967,7 +969,7 @@ figure, .page .content div.highlight {
 /****
  *
  * Breadcrumbs List, same as releases
- * 
+ *
  ****/
 
 .breadcrumbs-list-divider {
@@ -1044,7 +1046,7 @@ figure, .page .content div.highlight {
 /****
  *
  * CASE STUDIES
- * 
+ *
  ****/
 
 ul.case-studies {
@@ -1257,7 +1259,7 @@ ul.case-studies {
 /****
  *
  * POWERED BY
- * 
+ *
  ****/
 
  ul.powered-by {
@@ -1353,7 +1355,7 @@ ul.case-studies {
  **/
 
 .news__cards {
-  display: flex;  
+  display: flex;
 }
 
 .news__card {
@@ -1614,14 +1616,14 @@ ul.case-studies {
 }
 
 
-.talk-upcoming-tag i { 
+.talk-upcoming-tag i {
   margin-right: 20px;
   font-size: 20px;
 }
 
 .talk-upcoming-section {
   display: none;
-  
+
 }
 
 @media only screen and (min-width: 900px) {
@@ -1881,14 +1883,14 @@ ul.case-studies {
 }
 
 
-.meet-upcoming-tag i { 
+.meet-upcoming-tag i {
   margin-right: 20px;
   font-size: 20px;
 }
 
 .meet-upcoming-section {
   display: none;
-  
+
 }
 
 .meet--upcoming .meet-upcoming-section {
@@ -2098,7 +2100,7 @@ ul.case-studies {
  **/
 
 .pagination {
-  
+
 }
 
 .pagination.hide {
@@ -2145,7 +2147,7 @@ ul.case-studies {
 /****
  *
  * Committers
- * 
+ *
  ****/
 
 .committers {

http://git-wip-us.apache.org/repos/asf/samza/blob/2c4b6d5c/docs/learn/documentation/versioned/jobs/basic-configurations.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/jobs/basic-configurations.md b/docs/learn/documentation/versioned/jobs/basic-configurations.md
new file mode 100644
index 0000000..1f222c5
--- /dev/null
+++ b/docs/learn/documentation/versioned/jobs/basic-configurations.md
@@ -0,0 +1,168 @@
+---
+layout: page
+title: Basic Configurations
+---
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+The following table lists common properties that should be included in a Samza job configuration file.<br>
+The full list of configurations could be accessed in the [Configuration Table](configuration-table.html) page.
+
+
+* [Application Configurations](#application-configurations)
+* [JobCoordinator Configurations](#jobcoordinator-configurations)
+  + [Cluster Deployment](#cluster-deployment)
+  + [Standalone Deployment](#standalone-deployment)
+* [Storage Configurations](#storage-configuration)
+* [Checkpointing](#checkpointing)
+* [System & Streamn Configurations](#system--stream-configurations)
+  + [Kafka](#kafka)
+  + [HDFS](#hdfs)
+  + [EventHubs](#eventhubs)
+  + [Kinesis](#kinesis)
+  + [ElasticSearch](#elasticsearch)
+* [Metrics Configurations](#metrics-configuration)
+
+### Application Configurations
+These are the basic applications for setting up a Samza application.
+
+|Name|Default|Description|
+|--- |--- |--- |
+|app.name| |__Required:__ The name of your application.|
+|app.id|1|If you run several instances of your application at the same time, you need to give each instance a different app.id. This is important, since otherwise the applications will overwrite each others' checkpoints, and perhaps interfere with each other in other ways.|
+|app.class| |__Required:__ The application to run. The value is a fully-qualified Java classname, which must implement StreamApplication. A StreamApplication describes as a series of transformations on the streams.|
+|job.factory.class| |__Required:__ The job factory to use for running this job. <br> The value is a fully-qualified Java classname, which must implement StreamJobFactory.<br> Samza ships with three implementations:<br><br>`org.apache.samza.job.local.ThreadJobFactory`<br>Runs your job on your local machine using threads. This is intended only for development, not for production deployments.<br><br>`org.apache.samza.job.local.ProcessJobFactory`<br>Runs your job on your local machine as a subprocess. An optional command builderproperty can also be specified (see task.command.class for details). This is intended only for development,not for production deployments.<br><br>`org.apache.samza.job.yarn.YarnJobFactory`<br>Runs your job on a YARN grid. See below for YARN-specific configuration.|
+|job.name| |__Required:__ The name of your job. This name appears on the Samza dashboard, and it is used to tell apart this job's checkpoints from other jobs' checkpoints.|
+|job.id|1|If you run several instances of your job at the same time, you need to give each execution a different job.id. This is important, since otherwise the jobs will overwrite each others' checkpoints, and perhaps interfere with each other in other ways.|
+|job.coordinator.system| |__Required:__ The system-name to use for creating and maintaining the Coordinator Stream.|
+|job.default.system| |The system-name to access any input or output streams for which the system is not explicitly configured. This property is for input and output streams whereas job.coordinator.system is for samza metadata streams..|
+|job.container.count|1|The number of YARN containers to request for running your job. This is the main parameter for controlling the scale (allocated computing resources) of your job: to increase the parallelism of processing, you need to increase the number of containers. The minimum is one container, and the maximum number of containers is the number of task instances (usually the number of input stream partitions). Task instances are evenly distributed across the number of containers that you specify.|
+|job.changelog.system| |This property specifies a default system for changelog, which will be used with the stream specified in stores.store-name.changelog config. You can override this system by specifying both the system and the stream in stores.store-name.changelog.|
+|job.coordination.utils.factory|org.apache.samza.zk.<br>ZkCoordinationUtilsFactory|Class to use to create CoordinationUtils. Currently available values are:<br><br>`org.apache.samza.zk.ZkCoordinationUtilsFactory`<br>ZooKeeper based coordination utils.<br><br>`org.apache.samza.coordinator.AzureCoordinationUtilsFactory`<br>Azure based coordination utils.<br><br>These coordination utils are currently used for intermediate stream creation.|
+|task.class| |__Required:__ The fully-qualified name of the Java class which processes incoming messages from input streams. The class must implement [StreamTask](../api/javadocs/org/apache/samza/task/StreamTask.html) or [AsyncStreamTask](../api/javadocs/org/apache/samza/task/AsyncStreamTask.html), and may optionally implement [InitableTask](../api/javadocs/org/apache/samza/task/InitableTask.html), [ClosableTask](../api/javadocs/org/apache/samza/task/ClosableTask.html) and/or [WindowableTask](../api/javadocs/org/apache/samza/task/WindowableTask.html). The class will be instantiated several times, once for every input stream partition.|
+|task.window.ms|-1|If task.class implements [WindowableTask](../api/javadocs/org/apache/samza/task/WindowableTask.html), it can receive a windowing callback in regular intervals. This property specifies the time between window() calls, in milliseconds. If the number is negative (the default), window() is never called. Note that Samza is [single-threaded](../container/event-loop.html), so a window() call will never  occur concurrently with the processing of a message. If a message is being processed at the time when a window() call is due, the window() call occurs after the processing of the currentmessage has completed.|
+|task.commit.ms|60000|If task.checkpoint.factory is configured, this property determines how often a checkpoint is written. The value is the time between checkpoints, in milliseconds. The frequency of checkpointing affects failure recovery: if a container fails unexpectedly (e.g. due to crash or machine failure) and is restarted, it resumes processing at the last checkpoint. Any messages processed since the last checkpoint on the failed container are processed again. Checkpointing more frequently reduces the number of messages that may be processed twice, but also uses more resources.|
+|task.log4j.system| |Specify the system name for the StreamAppender. If this property is not specified in the config, Samza throws exception. (See [Stream Log4j Appender](logging.html#stream-log4j-appender)) Example: task.log4j.system=kafka|
+|serializers.registry.<br>**_serde-name_**.class| |Use this property to register a serializer/deserializer, which defines a way of encoding application objects as an array of bytes (used for messages in streams, and for data in persistent storage). You can give a serde any serde-name you want, and reference that name in properties like systems.*.samza.key.serde, systems.*.samza.msg.serde, streams.*.samza.key.serde, streams.*.samza.msg.serde, stores.*.key.serde and stores.*.msg.serde. The value of this property is the fully-qualified name of a Java class that implements SerdeFactory. Samza ships with several serdes:<br><br>`org.apache.samza.serializers.ByteSerdeFactory`<br>A no-op serde which passes through the undecoded byte array.<br><br>`org.apache.samza.serializers.ByteBufferSerdeFactory`<br>Encodes `java.nio.ByteBuffer` objects.<br><br>`org.apache.samza.serializers.IntegerSerdeFactory`<br>Encodes `java.lang.Integer` objects as binary (4 bytes fixed-length big-endian encoding).<b
 r><br>`org.apache.samza.serializers.StringSerdeFactory`<br>Encodes `java.lang.String` objects as UTF-8.<br><br>`org.apache.samza.serializers.JsonSerdeFactory`<br>Encodes nested structures of `java.util.Map`, `java.util.List` etc. as JSON. Note: This Serde enforces a dash-separated property naming convention, while JsonSerdeV2 doesn't. This serde is primarily meant for Samza's internal usage, and is publicly available for backwards compatibility.<br><br>`org.apache.samza.serializers.JsonSerdeV2Factory`<br>Encodes nested structures of `java.util.Map`, `java.util.List` etc. as JSON. Note: This Serde uses Jackson's default (camelCase) property naming convention. This serde should be preferred over JsonSerde, especially in High Level API, unless the dasherized naming convention is required (e.g., for backwards compatibility).<br><br>`org.apache.samza.serializers.LongSerdeFactory`<br>Encodes `java.lang.Long` as binary (8 bytes fixed-length big-endian encoding).<br><br>`org.apache.samza.se
 rializers.DoubleSerdeFactory`<br>Encodes `java.lang.Double` as binary (8 bytes double-precision float point).<br><br>`org.apache.samza.serializers.UUIDSerdeFactory`<br>Encodes `java.util.UUID` objects.<br><br>`org.apache.samza.serializers.SerializableSerdeFactory`<br>Encodes `java.io.Serializable` objects.<br><br>`org.apache.samza.serializers.MetricsSnapshotSerdeFactory`<br>Encodes `org.apache.samza.metrics.reporter.MetricsSnapshot` objects (which are used for reporting metrics) as JSON.<br><br>`org.apache.samza.serializers.KafkaSerdeFactory`<br>Adapter which allows existing `kafka.serializer.Encoder` and `kafka.serializer.Decoder` implementations to be used as Samza serdes. Set `serializers.registry.serde-name.encoder` and  s`erializers.registry.serde-name.decoder` to the appropriate class names.|
+
+### JobCoordinator Configurations
+Samza supports both standalone and clustered ([YARN](yarn-jobs.html)) deployment models. Below is the configurations options for both models.
+##### Cluster Deployment
+|Name|Default|Description|
+|--- |--- |--- |
+|yarn.package.path| |Required for YARN jobs: The URL from which the job package can be downloaded, for example a http:// or hdfs:// URL. The job package is a .tar.gz file with a specific directory structure.|
+|cluster-manager.container.memory.mb|1024|How much memory, in megabytes, to request from the cluster manager per container of your job. Along with cluster-manager.container.cpu.cores, this property determines how many containers the cluster manager will run on one machine. If the container exceeds this limit, it will be killed, so it is important that the container's actual memory use remains below the limit. The amount of memory used is normally the JVM heap size (configured with task.opts), plus the size of any off-heap memory allocation (for example stores.*.container.cache.size.bytes), plus a safety margin to allow for JVM overheads.|
+|cluster-manager.container.cpu.cores|1|The number of CPU cores to request per container of your job. Each node in the cluster has a certain number of CPU cores available, so this number (along with cluster-manager.container.memory.mb) determines how many containers can be run on one machine.|
+
+##### Standalone Deployment
+|Name|Default|Description|
+|--- |--- |--- |
+|job.coordinator.factory| |Class to use for job coordination. Currently available values are:<br><br>`org.apache.samza.standalone.PassthroughJobCoordinatorFactory`<br>Fixed partition mapping. No Zoookeeper. <br><br>`org.apache.samza.zk.ZkJobCoordinatorFactory`<br>Zookeeper-based coordination. <br><br>`org.apache.samza.AzureJobCoordinatorFactory`<br>Azure-based coordination<br><br> __Required__ only for non-cluster-managed applications.|
+|job.coordinator.zk.connect| |__Required__ for applications with Zookeeper-based coordination. Zookeeper coordinates (in "host:port[/znode]" format) to be used for coordination.
+|azure.storage.connect| |__Required__ for applications with Azure-based coordination. This is the storage connection string related to every Azure account. It is of the format: "DefaultEndpointsProtocol=https;AccountName=<Insert your account name>;AccountKey=<Insert your account key>"|
+
+### Storage Configuration
+These properties defines Samza's store mechanism for efficient [stateful stream processing](../container/state-management.html).
+
+|Name|Default|Description|
+|--- |--- |--- |
+|stores.**_store-name_**.factory| |You can give a store any **_store-name_** except `default` (`default` is reserved for defining default store parameters), and use that name to get a reference to the store in your stream task (call [TaskContext.getStore()](../api/javadocs/org/apache/samza/task/TaskContext.html#getStore(java.lang.String)) in your task's [init()](../api/javadocs/org/apache/samza/task/InitableTask.html#init(org.apache.samza.config.Config, org.apache.samza.task.TaskContext)) method). The value of this property is the fully-qualified name of a Java class that implements [StorageEngineFactory](../api/javadocs/org/apache/samza/storage/StorageEngineFactory.html). Samza currently ships with one storage engine implementation: <br><br>`org.apache.samza.storage.kv.RocksDbKeyValueStorageEngineFactory` <br>An on-disk storage engine with a key-value interface, implemented using [RocksDB](http://rocksdb.org/). It supports fast random-access reads and writes, as well as range queri
 es on keys. RocksDB can be configured with various additional tuning parameters.|
+|stores.**_store-name_**.key.serde| |If the storage engine expects keys in the store to be simple byte arrays, this [serde](../container/serialization.html) allows the stream task to access the store using another object type as key. The value of this property must be a serde-name that is registered with serializers.registry.*.class. If this property is not set, keys are passed unmodified to the storage engine (and the changelog stream, if appropriate).|
+|stores.**_store-name_**.msg.serde| |If the storage engine expects values in the store to be simple byte arrays, this [serde](../container/serialization.html) allows the stream task to access the store using another object type as value. The value of this property must be a serde-name that is registered with serializers.registry.*.class. If this property is not set, values are passed unmodified to the storage engine (and the changelog stream, if appropriate).|
+|stores.**_store-name_**.changelog| |Samza stores are local to a container. If the container fails, the contents of the store are lost. To prevent loss of data, you need to set this property to configure a changelog stream: Samza then ensures that writes to the store are replicated to this stream, and the store is restored from this stream after a failure. The value of this property is given in the form system-name.stream-name. The "system-name" part is optional. If it is omitted you must specify the system in job.changelog.system config. Any output stream can be used as changelog, but you must ensure that only one job ever writes to a given changelog stream (each instance of a job and each store needs its own changelog stream).|
+|stores.**_store-name_**.rocksdb.ttl.ms| |The time-to-live of the store. Please note it's not a strict TTL limit (removed only after compaction). Please use caution opening a database with and without TTL, as it might corrupt the database. Please make sure to read the [constraints](https://github.com/facebook/rocksdb/wiki/Time-to-Live) before using.|
+
+### System & Stream Configurations
+Samza consume and produce in [Streams](../container/streams.html) and has support variety of Systems including Kafka, HDFS, Azure EventHubs, Kinesis and ElasticSearch.
+
+|Name|Default|Description|
+|--- |--- |--- |
+|task.inputs| |__Required:__ A comma-separated list of streams that are consumed by this job. Each stream is given in the format system-name.stream-name. For example, if you have one input system called my-kafka, and want to consume two Kafka topics called PageViewEvent and UserActivityEvent, then you would set task.inputs=my-kafka.PageViewEvent, my-kafka.UserActivityEvent.|
+|systems.**_system-name_**.samza.factory| |__Required__: The fully-qualified name of a Java class which provides a system. A system can provide input streams which you can consume in your Samza job, or output streams to which you can write, or both. The requirements on a system are very flexible — it may connect to a message broker, or read and write files, or use a database, or anything else. The class must implement [SystemFactory](../api/javadocs/org/apache/samza/system/SystemFactory.html). Samza ships with the following implementations: <br><br>`org.apache.samza.system.kafka.KafkaSystemFactory` [(Configs)](#kafka)<br>`org.apache.samza.system.hdfs.HdfsSystemFactory` [(Configs)](#hdfs) <br>`org.apache.samza.system.eventhub.EventHubSystemFactory` [(Configs)](#eventhubs)<br>`org.apache.samza.system.kinesis.KinesisSystemFactory` [(Configs)](#kinesis)<br>`org.apache.samza.system..elasticsearch.ElasticsearchSystemFactory` [(Configs)](#elasticsearch)
+|systems.**_system-name_**.default.stream.*| |A set of default properties for any stream associated with the system. For example, if "systems.kafka-system.default.stream.replication.factor"=2 was configured, then every Kafka stream created on the kafka-system will have a replication factor of 2 unless the property is explicitly overridden at the stream scope using streams properties.|
+|systems.**_system-name_**.default.stream.samza.key.serde| |The [serde](../container/serialization.html) which will be used to deserialize the key of messages on input streams, and to serialize the key of messages on output streams. This property defines the serde for an for all streams in the system. See the stream-scoped property to define the serde for an individual stream. If both are defined, the stream-level definition takes precedence. The value of this property must be a serde-name that is registered with serializers.registry.*.class. If this property is not set, messages are passed unmodified between the input stream consumer, the task and the output stream producer.|
+|systems.**_system-name_**.default.stream.samza.msg.serde| |The [serde](../container/serialization.html) which will be used to deserialize the value of messages on input streams, and to serialize the value of messages on output streams. This property defines the serde for an for all streams in the system. See the stream-scoped property to define the serde for an individual stream. If both are defined, the stream-level definition takes precedence. The value of this property must be a serde-name that is registered with serializers.registry.*.class. If this property is not set, messages are passed unmodified between the input stream consumer, the task and the output stream producer.|
+|systems.**_system-name_**.default.stream.samza.offset.default|`upcoming`|If a container starts up without a [checkpoint](../container/checkpointing.html),  this property determines where in the input stream we should start consuming. The value must be an [OffsetType](../api/javadocs/org/apache/samza/system/SystemStreamMetadata.OffsetType.html), one of the following: <br><br>`upcoming` <br>Start processing messages that are published after the job starts. Any messages published while the job was not running are not processed. <br><br>`oldest` <br>Start processing at the oldest available message in the system, and [reprocess](reprocessing.html) the entire available message history. <br><br>This property is for all streams within a system. To set it for an individual stream, see streams.stream-id.samza.offset.default. If both are defined, the stream-level definition takes precedence.|
+|streams.**_stream-id_**.samza.system| |The system-name of the system on which this stream will be accessed. This property binds the stream to one of the systems defined with the property systems.system-name.samza.factory. If this property isn't specified, it is inherited from job.default.system.|
+|streams.**_stream-id_**.samza.physical.name| |The physical name of the stream on the system on which this stream will be accessed. This is opposed to the stream-id which is the logical name that Samza uses to identify the stream. A physical name could be a Kafka topic name, an HDFS file URN or any other system-specific identifier.|
+|streams.**_stream-id_**.samza.key.serde| |The [serde](../container/serialization.html) which will be used to deserialize the key of messages on input streams, and to serialize the key of messages on output streams. This property defines the serde for an individual stream. See the system-scoped property to define the serde for all streams within a system. If both are defined, the stream-level definition takes precedence. The value of this property must be a serde-name that is registered with serializers.registry.*.class. If this property is not set, messages are passed unmodified between the input stream consumer, the task and the output stream producer.|
+|streams.**_stream-id_**.samza.msg.serde| |The [serde](../container/serialization.html) which will be used to deserialize the value of messages on input streams, and to serialize the value of messages on output streams. This property defines the serde for an individual stream. See the system-scoped property to define the serde for all streams within a system. If both are defined, the stream-level definition takes precedence. The value of this property must be a serde-name that is registered with serializers.registry.*.class. If this property is not set, messages are passed unmodified between the input stream consumer, the task and the output stream producer.|
+|streams.**_stream-id_**.samza.offset.default|`upcoming`|If a container starts up without a [checkpoint](../container/checkpointing.html), this property determines where in the input stream we should start consuming. The value must be an [OffsetType (../api/javadocs/org/apache/samza/system/SystemStreamMetadata.OffsetType.html), one of the following: <br><br>`upcoming` <br>Start processing messages that are published after the job starts. Any messages published while the job was not running are not processed. <br><br>`oldest` <br>Start processing at the oldest available message in the system, and [reprocess](reprocessing.html) the entire available message history. <br><br>This property is for an individual stream. To set it for all streams within a system, see  systems.system-name.samza.offset.default. If both are defined, the stream-level definition takes precedence.|
+|task.broadcast.inputs| |This property specifies the partitions that all tasks should consume. The systemStreamPartitions you put here will be sent to all the tasks. <br>Format: system-name.stream-name#partitionId or system-name.stream-name#[startingPartitionId-endingPartitionId] <br>Example: task.broadcast.inputs=mySystem.broadcastStream#[0-2], mySystem.broadcastStream#0|
+
+##### Kafka
+Configs for consuming and producing to [Apache Kafka](https://kafka.apache.org/). This section applies if you have set systems.*.samza.factory = `org.apache.samza.system.kafka.KafkaSystemFactory`
+Samples found [here](../../../../startup/hello-samza/versioned)
+
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.consumer.zookeeper.connect| |The hostname and port of one or more Zookeeper nodes where information about the Kafka cluster can be found. This is given as a comma-separated list of hostname:port pairs, such as zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181. If the cluster information is at some sub-path of the Zookeeper namespace, you need to include the path at the end of the list of hostnames, for example: zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181/clusters/my-kafka|
+|systems.**_system-name_**.consumer.auto.offset.reset|`largest`|This setting determines what happens if a consumer attempts to read an offset that is outside of the current valid range. This could happen if the topic does not exist, or if a checkpoint is older than the maximum message history retained by the brokers. This property is not to be confused with systems.*.samza.offset.default, which determines what happens if there is no checkpoint. The following are valid values for auto.offset.reset: <br><br>`smallest` <br>Start consuming at the smallest (oldest) offset available on the broker (process as much message history as available). <br><br>`largest` <br>Start consuming at the largest (newest) offset available on the broker (skip any messages published while the job was not running). <br><br>anything else <br>Throw an exception and refuse to start up the job.|
+|systems.**_system-name_**.producer.bootstrap.servers| | A list of network endpoints where the Kafka brokers are running. This is given as a comma-separated list of hostname:port pairs, for example kafka1.example.com:9092,kafka2.example.com:9092,kafka3.example.com:9092. It's not necessary to list every single Kafka node in the cluster: Samza uses this property in order to discover which topics and partitions are hosted on which broker. This property is needed even if you are only consuming from Kafka, and not writing to it, because Samza uses it to discover metadata about streams being consumed.|
+##### HDFS
+Configs for [consuming](../hadoop/consumer.html) and [producing](../hadoop/producer.html) to [HDFS](https://hortonworks.com/apache/hdfs/). This section applies if you have set systems.*.samza.factory = `org.apache.samza.system.hdfs.HdfsSystemFactory`
+More about batch processing [here](../hadoop/overview.html).
+
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.producer.hdfs.base.output.dir|/user/USERNAME/SYSTEMNAME|The base output directory for HDFS writes. Defaults to the home directory of the user who ran the job, followed by the systemName for this HdfsSystemProducer as defined in the job.properties file.|
+|systems.**_system-name_**.producer.hdfs.writer.class|`org.apache.samza.system.hdfs.writer.`<br>`BinarySequenceFileHdfsWriter`|Fully-qualified class name of the HdfsWriter implementation this HDFS Producer system should use|
+|systems.**_system-name_**.stagingDirectory|Inherit from yarn.job.staging.directory if set|Staging directory for storing partition description. By default (if not set by users) the value is inherited from "yarn.job.staging.directory" internally. The default value is typically good enough unless you want explicitly use a separate location.|
+##### EventHubs
+Configs for consuming and producing to [Azure EventHubs](https://azure.microsoft.com/en-us/services/event-hubs/). This section applies if you have set systems.*.samza.factory = `org.apache.samza.system.eventhub.EventHubSystemFactory`
+Documentation and samples found [here](../azure/eventhubs.html)
+
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.stream.list| |List of Samza **_stream-id_** used for the Eventhub system|
+|streams.**_stream-id_**.eventhubs.namespace| |Namespace of the associated stream-ids. __Required__ to access the Eventhubs entity per stream.|
+|streams.**_stream-id_**.eventhubs.entitypath| |Entity of the associated stream-ids. __Required__ to access the Eventhubs entity per stream.|
+|sensitive.streams.**_stream-id_**.eventhubs.sas.keyname| |SAS Keyname of the associated stream-ids. __Required__ to access the Eventhubs entity per stream.|
+|sensitive.streams.**_stream-id_**.eventhubs.sas.token| |SAS Token the associated stream-ids. __Required__ to access the Eventhubs entity per stream.|
+##### Kinesis
+Configs for consuming and producing to [Amazon Kinesis](https://aws.amazon.com/kinesis/). This section applies if you have set systems.*.samza.factory = `org.apache.samza.system.kinesis.KinesisSystemFactory`
+Documentation and samples found [here](../aws/kinesis.html)
+
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.streams.**_stream-name_**.aws.region| |Region of the associated stream-name. __Required__ to access the Kinesis data stream.|
+|systems.**_system-name_**.streams.**_stream-name_**.aws.accessKey| |AccessKey of the associated stream-name. __Required__ to access Kinesis data stream.|
+|systems.**_system-name_**.streams.**_stream-name_**.aws.secretKey| |SecretKey of the associated stream-name. __Required__ to access the Kinesis data stream.|
+##### ElasticSearch
+Configs for producing to [ElasticSearch](https://www.elastic.co/products/elasticsearch). This section applies if you have set systems.*.samza.factory = `org.apache.samza.system..elasticsearch.ElasticsearchSystemFactory`
+
+|Name|Default|Description|
+|--- |--- |--- |
+|systems.**_system-name_**.client.factory| |__Required:__ The elasticsearch client factory used for connecting to the Elasticsearch cluster. Samza ships with the following implementations:<br><br>`org.apache.samza.system.elasticsearch.client.TransportClientFactory`<br>Creates a TransportClient that connects to the cluster remotely without joining it. This requires the transport host and port properties to be set.<br><br>`org.apache.samza.system.elasticsearch.client.NodeClientFactory`<br>Creates a Node client that connects to the cluster by joining it. By default this uses zen discovery to find the cluster but other methods can be configured.|
+|systems.**_system-name_**.index.request.factory|`org.apache.samza.system.`<br>`elasticsearch.indexrequest.`<br>`DefaultIndexRequestFactory`|The index request factory that converts the Samza OutgoingMessageEnvelope into the IndexRequest to be send to elasticsearch. The default [IndexRequestFactory](org.apache.samza.system.elasticsearch.indexrequest) behaves as follows:<br><br>`Stream name`<br>The stream name is of the format {index-name}/{type-name} which map on to the elasticsearch index and type.<br><br>`Message id`<br>If the message has a key this is set as the document id, otherwise Elasticsearch will generate one for each document.<br><br>`Partition id`<br>If the partition key is set then this is used as the Elasticsearch routing key.<br><br>`Message`<br>The message must be either a byte[] which is passed directly on to Elasticsearch, or a Map which is passed on to the Elasticsearch client which serialises it into a JSON String. Samza serdes are not currently supported.|
+|systems.**_system-name_**.client.transport.host| |__Required__ for TransportClientFactory; The hostname that the TransportClientFactory connects to.|
+|systems.**_system-name_**.client.transport.port| |__Required__ for TransportClientFactory; The port that the TransportClientFactory connects to.|
+
+### Checkpointing
+[Checkpointing](../container/checkpointing.html) is not required, but recommended for most jobs. If you don't configure checkpointing, and a job or container restarts, it does not remember which messages it has already processed. Without checkpointing, consumer behavior on startup is determined by the ...samza.offset.default setting. Checkpointing allows a job to start up where it previously left off.
+
+|Name|Default|Description|
+|--- |--- |--- |
+|task.checkpoint.factory| |To enable [checkpointing](../container/checkpointing.html), you must set this property to the fully-qualified name of a Java class that implements [CheckpointManagerFactory](../api/javadocs/org/apache/samza/checkpoint/CheckpointManagerFactory.html). Samza ships with two checkpoint managers by default: <br><br>`org.apache.samza.checkpoint.file.FileSystemCheckpointManagerFactory` <br>Writes checkpoints to files on the local filesystem. You can configure the file path with the task.checkpoint.path property. This is a simple option if your job always runs on the same machine. On a multi-machine cluster, this would require a network filesystem mount. <br><br>`org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory` <br>Writes checkpoints to a dedicated topic on a Kafka cluster. This is the recommended option if you are already using Kafka for input or output streams. Use the task.checkpoint.system property to configure which Kafka cluster to use for che
 ckpoints.|
+|task.checkpoint.system| |This property is required if you are using Kafka for checkpoints (task.checkpoint.factory = `org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory`). You must set it to the system-name of a Kafka system. The stream name (topic name) within that system is automatically determined from the job name and ID: __samza_checkpoint_${job.name}_${job.id} (with underscores in the job name and ID replaced by hyphens).|
+
+### Metrics Configuration
+|Name|Default|Description|
+|--- |--- |--- |
+|metrics.reporter.**_reporter-name_**.class| |Samza automatically tracks various metrics which are useful for monitoring the health of a job, and you can also track your own metrics. With this property, you can define any number of metrics reporters which send the metrics to a system of your choice (for graphing, alerting etc). You give each reporter an arbitrary reporter-name. To enable the reporter, you need to reference the reporter-name in metrics.reporters. The value of this property is the fully-qualified name of a Java class that implements MetricsReporterFactory. Samza ships with these implementations by default: <br><br>`org.apache.samza.metrics.reporter.JmxReporterFactory`<br>With this reporter, every container exposes its own metrics as JMX MBeans. The JMX server is started on a random port to avoid collisions between containers running on the same machine.<br><br>`org.apache.samza.metrics.reporter.MetricsSnapshotReporterFactory`<br>This reporter sends the latest values o
 f all metrics as messages to an output stream once per minute. The output stream is configured with metrics.reporter.*.stream and it can use any system supported by Samza.|
+|metrics.reporters| |If you have defined any metrics reporters with metrics.reporter.*.class, you need to list them here in order to enable them. The value of this property is a comma-separated list of reporter-name tokens.|
+|metrics.reporter.**_reporter-name_**.stream| |If you have registered the metrics reporter metrics.reporter.*.class = `org.apache.samza.metrics.reporter.MetricsSnapshotReporterFactory`, you need to set this property to configure the output stream to which the metrics data should be sent. The stream is given in the form system-name.stream-name, and the system must be defined in the job configuration. It's fine for many different jobs to publish their metrics to the same metrics stream. Samza defines a simple JSON encoding for metrics; in order to use this encoding, you also need to configure a serde for the metrics stream: <br><br>streams.*.samza.msg.serde = `metrics-serde` (replacing the asterisk with the stream-name of the metrics stream) <br>serializers.registry.metrics-serde.class = `org.apache.samza.serializers.MetricsSnapshotSerdeFactory` (registering the serde under a serde-name of metrics-serde)|
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/samza/blob/2c4b6d5c/docs/learn/documentation/versioned/jobs/configuration.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/jobs/configuration.md b/docs/learn/documentation/versioned/jobs/configuration.md
index c0ebb69..3511469 100644
--- a/docs/learn/documentation/versioned/jobs/configuration.md
+++ b/docs/learn/documentation/versioned/jobs/configuration.md
@@ -56,6 +56,8 @@ Configuration keys that absolutely must be defined for a Samza job are:
 * `task.class`
 * `task.inputs`
 
+See the [Basic Configurations](basic-configurations.html) page to get started.
+
 ### Configuration Keys
 
 A complete list of configuration keys can be found on the [Configuration Table](configuration-table.html) page.  Note