You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by cr...@apache.org on 2014/07/09 18:37:02 UTC

svn commit: r1609232 [3/4] - in /incubator/samza/site: ./ community/ contribute/ css/ learn/documentation/0.7.0/ learn/documentation/0.7.0/api/ learn/documentation/0.7.0/comparisons/ learn/documentation/0.7.0/container/ learn/documentation/0.7.0/introd...

Modified: incubator/samza/site/learn/documentation/0.7.0/jobs/configuration-table.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/jobs/configuration-table.html?rev=1609232&r1=1609231&r2=1609232&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/jobs/configuration-table.html (original)
+++ incubator/samza/site/learn/documentation/0.7.0/jobs/configuration-table.html Wed Jul  9 16:37:01 2014
@@ -1,3 +1,4 @@
+<!DOCTYPE html>
 <!--
    Licensed to the Apache Software Foundation (ASF) under one or more
    contributor license agreements.  See the NOTICE file distributed with
@@ -15,231 +16,1111 @@
    limitations under the License.
 -->
 <html>
-  <body>
-    <table cellspacing="2" border="1" cellpadding="2">
-      <tbody>
-        <tr><th>Name</th><th>Default</th><th>Description</th></tr>
-        <tr>
-          <td><strong>Job</strong></td>
-          <td> </td>
-          <td> </td>
-        </tr>
-        <tr>
-          <td>job.factory.class</td>
-          <td>none</td>
-          <td><strong>Required:</strong> The job factory to use when running a task. This can be either samza.job.local.LocalJobFactory, or samza.job.yarn.YarnJobFactory.</td>
-        </tr>
-        <tr>
-          <td>job.name</td>
-          <td>none</td>
-          <td><strong>Required:</strong> The name of your job. This is the name that will appear on the Samza dashboard, when your job is running.</td>
-        </tr>
-        <tr>
-          <td>job.id</td>
-          <td>1</td>
-          <td>An ID string that is used to distinguish between multiple concurrent executions of the same Samza job.</td>
-        </tr>
-        <tr>
-          <td><strong>Task</strong></td>
-          <td> </td>
-          <td> </td>
-        </tr>
-        <tr>
-          <td>task.class</td>
-          <td>none</td>
-          <td><strong>Required:</strong> The package name of the StreamTask to execute. For example, samza.task.example.MyStreamerTask.</td>
-        </tr>
-        <tr>
-          <td>task.execute</td>
-          <td>bin/run-container.sh</td>
-          <td>The command that a StreamJob should invoke to start the TaskRunner.</td>
-        </tr>
-        <tr>
-          <td>task.message.buffer.size</td>
-          <td>10000</td>
-          <td>The number of messages that the TaskRunner will buffer in the event loop queue, before it begins blocking StreamConsumers.</td>
-        </tr>
-        <tr>
-          <td>task.inputs</td>
-          <td>none</td>
-          <td><strong>Required:</strong> A CSV list of stream names that the TaskRunner should use to read messages from for your StreamTasks (e.g. page-view-event,service-metrics).</td>
-        </tr>
-        <tr>
-          <td>task.window.ms</td>
-          <td>-1</td>
-          <td>How often the TaskRunner should call window() on a WindowableTask. A negative number tells the TaskRunner to never call window().</td>
-        </tr>
-        <tr>
-          <td>task.commit.ms</td>
-          <td>60000</td>
-          <td>How often the TaskRunner should call writeCheckpoint for a partition.</td>
-        </tr>
-        <tr>
-          <td>task.command.class</td>
-          <td>samza.task.ShellCommandBuilder</td>
-          <td>The class to use to build environment variables for the task.execute command.</td>
-        </tr>
-        <tr>
-          <td>task.lifecycle.listeners</td>
-          <td>none</td>
-          <td>A CSV list of lifecycle listener names that the TaskRunner notify when lifecycle events occur (e.g. my-lifecycle-manager).</td>
-        </tr>
-        <tr>
-          <td>task.lifecycle.listener.%s.class</td>
-          <td>none</td>
-          <td>The class name for a lifecycle listener factory (e.g. task.lifecycle.listener.my-lifecycle-manager.class=foo.bar.MyLifecycleManagerFactory)</td>
-        </tr>
-        <tr>
-          <td>task.checkpoint.factory</td>
-          <td>none</td>
-          <td>The class name for the checkpoint manager to use (e.g. samza.task.state.KafkaCheckpointManagerFactory)</td>
-        </tr>
-        <tr>
-          <td>task.checkpoint.failure.retry.ms</td>
-          <td>10000</td>
-          <td>If readLastCheckpoint, or writeCheckpoint fails, the TaskRunner will wait this interval before retrying the checkpoint.</td>
-        </tr>
-        <tr>
-          <td colspan="1">task.opts</td>
-          <td colspan="1">none</td>
-          <td colspan="1">JVM options that should be attached to each JVM that is running StreamTasks. If you wish to reference the log directory from this parameter, use logs/. <span>If you wish to reference code in the Samza job's TGZ package use __package/.</span></td>
-        </tr>
-        <tr>
-          <td><strong>System</strong></td>
-          <td> </td>
-          <td> </td>
-        </tr>
-        <tr>
-          <td>systems.%s.samza.consumer.factory</td>
-          <td>none</td>
-          <td>The StreamConsumerFactory class to use when creating a new StreamConsumer for this system (e.g. samza.stream.kafka.KafkaConsumerFactory).</td>
-        </tr>
-        <tr>
-          <td>systems.%s.samza.producer.factory</td>
-          <td>none</td>
-          <td>The StreamProducerFactory class to use when creating a new StreamProducer for this system (e.g. samza.stream.kafka.KafkaProducerFactory).</td>
-        </tr>
-        <tr>
-          <td>systems.%s.samza.partition.manager</td>
-          <td>none</td>
-          <td>The PartitionManager class to use when fetching partition information about streams for the system (e.g. samza.stream.kafka.KafkaPartitionManager).</td>
-        </tr>
-        <tr>
-          <td>systems.%s.producer.reconnect.interval.ms</td>
-          <td>10000</td>
-          <td>If a producer fails, the TaskRunner will wait this interval before retrying.</td>
-        </tr>
-        <tr>
-          <td>systems.%s.*</td>
-          <td>none</td>
-          <td>For both Kafka and Databus, any configuration you supply under this namespace will be given to the underlying Kafka consumer/producer, and Databus consumer/producer. This is useful for configuring things like autooffset.reset, socket buffer size, fetch size, batch size, etc.</td>
-        </tr>
-        <tr>
-          <td><strong>Stream</strong></td>
-          <td> </td>
-          <td> </td>
-        </tr>
-        <tr>
-          <td>streams.%s.system</td>
-          <td>none</td>
-          <td>The name of the system associated with this stream (e.g. kafka-aggregate-tracking). This name must match with a system defined in the configuration file.</td>
-        </tr>
-        <tr>
-          <td>streams.%s.stream</td>
-          <td>none</td>
-          <td>The name of the stream in the system (e.g. PageViewEvent).</td>
-        </tr>
-        <tr>
-          <td>streams.%s.serde</td>
-          <td>none</td>
-          <td>The serde to use to serialize and deserialize messages for this stream. If undefined, the TaskRunner will try to fall back to the default serde, if it's defined.</td>
-        </tr>
-        <tr>
-          <td>streams.%s.consumer.reset.offset</td>
-          <td>false</td>
-          <td>If set to true, the TaskRunner will ignore the last checkpoint offset for this stream, and use null as the offset for the stream instead. In the case of Kafka's consumer, it will fall back to autooffset.reset. In the case of Databus' consumer, it will fall back to SCN 0.</td>
-        </tr>
-        <tr>
-          <td>streams.%s.consumer.failure.retry.ms</td>
-          <td>10000</td>
-          <td>If a StreamConsumer fails, the TaskRunner will wait this interval before retrying.</td>
-        </tr>
-        <tr>
-          <td>streams.%s.consumer.max.bytes.per.sec</td>
-          <td>none</td>
-          <td>The maximum number of bytes that the TaskRunner will allow from all partitions that it's reading for this stream. For example, if you have an input stream with two partitions, and 1 MB/sec max, then the maximum bytes the TaskRunner will read per second from all of the input stream's partitions is 1 MB/sec.</td>
-        </tr>
-        <tr>
-          <td>streams.%s.producer.reconnect.interval.ms</td>
-          <td>10000</td>
-          <td>If a StreamProducer fails, the TaskRunner will wait this interval before retrying.</td>
-        </tr>
-        <tr>
-          <td><strong>Serdes</strong></td>
-          <td> </td>
-          <td> </td>
-        </tr>
-        <tr>
-          <td>serializers.registry.%s.class</td>
-          <td>none</td>
-          <td>The name of a class that implements both SerializerFactory and DeserializerFactory (e.g. serializers.registry.json.class=samza.serializers.JsonSerdeFactory).</td>
-        </tr>
-        <tr>
-          <td>serializers.default</td>
-          <td>none</td>
-          <td>The default serde to use, if one is not defined for an input or output stream (e.g. serializers.default=json).</td>
-        </tr>
-        <tr>
-          <td><strong>YARN</strong></td>
-          <td> </td>
-          <td> </td>
-        </tr>
-        <tr>
-          <td>yarn.package.path</td>
-          <td>none</td>
-          <td>The tgz location of your Samza job. This tgz file is well structured. See the YARN section for details.</td>
-        </tr>
-        <tr>
-          <td>yarn.container.memory.mb</td>
-          <td>1024</td>
-          <td>How much memory to ask for (per-container), when Samza is starting a YARN container.</td>
-        </tr>
-        <tr>
-          <td>yarn.am.container.memory.mb</td>
-          <td>1024</td>
-          <td>How much memory to ask for (per-application-master), when Samza is starting a YARN container.</td>
-        </tr>
-        <tr>
-          <td>yarn.container.count</td>
-          <td>1</td>
-          <td>How many containers to start when a Samza job is started in YARN. Partitions are divided evenly among the containers.</td>
-        </tr>
-        <tr>
-          <td colspan="1">yarn.am.opts</td>
-          <td colspan="1">none</td>
-          <td colspan="1"><span>JVM options that should be attached to each JVM that is running the ApplicationMaster. If you wish to reference the log directory from this parameter, use logs/. If you wish to reference code in the Samza job's TGZ package use __package/.</span></td>
-        </tr>
-        <tr>
-          <td><strong>Metrics</strong></td>
-          <td> </td>
-          <td> </td>
-        </tr>
-        <tr>
-          <td>metrics.reporter.%s.class</td>
-          <td>none</td>
-          <td>The package and class for a metrics reporter (e.g. metrics.reporter.foo-bar.class=samza.metrics.reporter.MetricsSnapshotReporter).</td>
-        </tr>
-        <tr>
-          <td>metrics.reporter.%s.window.ms</td>
-          <td>10000</td>
-          <td>How often the TaskRunner tells the metrics reporter to send update or send its metrics.</td>
-        </tr>
-        <tr>
-          <td>metrics.reporters</td>
-          <td>none</td>
-          <td>A CSV list of metric reporter names (e.g. metrics.reporters=foo-bar).</td>
-        </tr>
-      </tbody>
-    </table>
-  </body>
+    <head>
+        <meta charset="utf-8">
+        <title>Samza Configuration Reference</title>
+        <style type="text/css">
+            body {
+                font-family: "Helvetica Neue",Helvetica,Arial,sans-serif;
+                font-size: 14px;
+                line-height: 22px;
+                color: #333;
+                background-color: #fff;
+            }
+
+            table {
+                border-collapse: collapse;
+                margin: 1em 0;
+            }
+
+            table th, table td {
+                text-align: left;
+                vertical-align: top;
+                padding: 12px;
+                border-bottom: 1px solid #ccc;
+                border-top: 1px solid #ccc;
+                border-left: 0;
+                border-right: 0;
+            }
+
+            table td.property, table td.default {
+                white-space: nowrap;
+            }
+
+            table th.section {
+                background-color: #eee;
+            }
+
+            table th.section .subtitle {
+                font-weight: normal;
+            }
+
+            code, a.property {
+                font-family: monospace;
+            }
+
+            span.system, span.stream, span.store, span.serde, span.rewriter, span.listener, span.reporter {
+                padding: 1px;
+                margin: 1px;
+                border-width: 1px;
+                border-style: solid;
+                border-radius: 4px;
+            }
+
+            span.system {
+                background-color: #ddf;
+                border-color: #bbd;
+            }
+
+            span.stream {
+                background-color: #dfd;
+                border-color: #bdb;
+            }
+
+            span.store {
+                background-color: #fdf;
+                border-color: #dbd;
+            }
+
+            span.serde {
+                background-color: #fdd;
+                border-color: #dbb;
+            }
+
+            span.rewriter {
+                background-color: #eee;
+                border-color: #ccc;
+            }
+
+            span.listener {
+                background-color: #ffd;
+                border-color: #ddb;
+            }
+
+            span.reporter {
+                background-color: #dff;
+                border-color: #bdd;
+            }
+        </style>
+    </head>
+
+    <body>
+        <h1>Samza Configuration Reference</h1>
+        <p>The following table lists all the standard properties that can be included in a Samza job configuration file.</p>
+        <p>Words highlighted like <span class="system">this</span> are placeholders for your own variable names.</p>
+        <table>
+            <tbody>
+                <tr><th>Name</th><th>Default</th><th>Description</th></tr>
+                <tr>
+                    <th colspan="3" class="section" id="job"><a href="configuration.html">Samza job configuration</a></th>
+                </tr>
+
+                <tr>
+                    <td class="property" id="job-factory-class">job.factory.class</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        <strong>Required:</strong> The <a href="job-runner.html">job factory</a> to use for running this job.
+                        The value is a fully-qualified Java classname, which must implement
+                        <a href="../api/javadocs/org/apache/samza/job/StreamJobFactory.html">StreamJobFactory</a>.
+                        Samza ships with two implementations:
+                        <dl>
+                            <dt><code>org.apache.samza.job.local.LocalJobFactory</code></dt>
+                            <dd>Runs your job on your local machine, either using threads or as a subprocess (see the
+                                <a href="#task-command-class" class="property">task.command.class</a> property for
+                                details). This is intended only for development, not for production deployments.</dd>
+                            <dt><code>org.apache.samza.job.yarn.YarnJobFactory</code></dt>
+                            <dd>Runs your job on a YARN grid. See <a href="#yarn">below</a> for YARN-specific configuration.</dd>
+                        </dl>
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="job-name">job.name</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        <strong>Required:</strong> The name of your job. This name appears on the Samza dashboard, and it
+                        is used to tell apart this job's checkpoints from other jobs' checkpoints.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="job-id">job.id</td>
+                    <td class="default">1</td>
+                    <td class="description">
+                        If you run several instances of your job at the same time, you need to give each execution a
+                        different <code>job.id</code>. This is important, since otherwise the jobs will overwrite each
+                        others' checkpoints, and perhaps interfere with each other in other ways.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="job-config-rewriter-class">job.config.rewriter.<br><span class="rewriter">rewriter-name</span>.class</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        You can optionally define configuration rewriters, which have the opportunity to dynamically
+                        modify the job configuration before the job is started. For example, this can be useful for
+                        pulling configuration from an external configuration management system, or for determining
+                        the set of input streams dynamically at runtime. The value of this property is a
+                        fully-qualified Java classname which must implement
+                        <a href="../api/javadocs/org/apache/samza/config/ConfigRewriter.html">ConfigRewriter</a>.
+                        Samza ships with one rewriter by default:
+                        <dl>
+                            <dt><code>org.apache.samza.config.RegExTopicGenerator</code></dt>
+                            <dd>When consuming from Kafka, this allows you to consume all Kafka topics that match
+                                some regular expression (rather than having to list each topic explicitly).
+                                This rewriter has <a href="#regex-rewriter">additional configuration</a>.</dd>
+                        </dl>
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="job-config-rewriters">job.config.rewriters</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        If you have defined configuration rewriters, you need to list them here, in the order in
+                        which they should be applied. The value of this property is a comma-separated list of
+                        <span class="rewriter">rewriter-name</span> tokens.
+                    </td>
+                </tr>
+
+                <tr>
+                    <th colspan="3" class="section" id="task"><a href="../api/overview.html">Task configuration</a></th>
+                </tr>
+
+                <tr>
+                    <td class="property" id="task-class">task.class</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        <strong>Required:</strong> The fully-qualified name of the Java class which processes
+                        incoming messages from input streams. The class must implement
+                        <a href="../api/javadocs/org/apache/samza/task/StreamTask.html">StreamTask</a>, and may optionally implement
+                        <a href="../api/javadocs/org/apache/samza/task/InitableTask.html">InitableTask</a>,
+                        <a href="../api/javadocs/org/apache/samza/task/ClosableTask.html">ClosableTask</a> and/or
+                        <a href="../api/javadocs/org/apache/samza/task/WindowableTask.html">WindowableTask</a>.
+                        The class will be instantiated several times, once for every
+                        <a href="../container/samza-container.html#tasks-and-partitions">input stream partition</a>.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="task-inputs">task.inputs</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        <strong>Required:</strong> A comma-separated list of streams that are consumed by this job.
+                        Each stream is given in the format
+                        <span class="system">system-name</span>.<span class="stream">stream-name</span>.
+                        For example, if you have one input system called <code>my-kafka</code>, and want to consume two
+                        Kafka topics called <code>PageViewEvent</code> and <code>UserActivityEvent</code>, then you would set
+                        <code>task.inputs=my-kafka.PageViewEvent, my-kafka.UserActivityEvent</code>.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="task-window-ms">task.window.ms</td>
+                    <td class="default">-1</td>
+                    <td class="description">
+                        If <a href="#task-class" class="property">task.class</a> implements
+                        <a href="../api/javadocs/org/apache/samza/task/WindowableTask.html">WindowableTask</a>, it can
+                        receive a <a href="../container/windowing.html">windowing callback</a> in regular intervals.
+                        This property specifies the time between window() calls, in milliseconds. If the number is
+                        negative (the default), window() is never called. Note that Samza is
+                        <a href="../container/event-loop.html">single-threaded</a>, so a window() call will never
+                        occur concurrently with the processing of a message. If a message is being processed at the
+                        time when a window() call is due, the window() call occurs after the processing of the current
+                        message has completed.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="task-checkpoint-factory">task.checkpoint.factory</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        To enable <a href="../container/checkpointing.html">checkpointing</a>, you must set
+                        this property to the fully-qualified name of a Java class that implements
+                        <a href="../api/javadocs/org/apache/samza/checkpoint/CheckpointManagerFactory.html">CheckpointManagerFactory</a>.
+                        This is not required, but recommended for most jobs. If you don't configure checkpointing,
+                        and a job or container restarts, it does not remember which messages it has already processed.
+                        Without checkpointing, consumer behavior is determined by the
+                        <a href="#systems-samza-offset-default" class="property">...samza.offset.default</a>
+                        setting, which by default skips any messages that were published while the container was
+                        restarting. Checkpointing allows a job to start up where it previously left off.
+                        Samza ships with two checkpoint managers by default:
+                        <dl>
+                            <dt><code>org.apache.samza.checkpoint.file.FileSystemCheckpointManagerFactory</code></dt>
+                            <dd>Writes checkpoints to files on the local filesystem. You can configure the file path
+                                with the <a href="#task-checkpoint-path" class="property">task.checkpoint.path</a>
+                                property. This is a simple option if your job always runs on the same machine.
+                                On a multi-machine cluster, this would require a network filesystem mount.</dd>
+                            <dt><code>org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory</code></dt>
+                            <dd>Writes checkpoints to a dedicated topic on a Kafka cluster. This is the recommended
+                                option if you are already using Kafka for input or output streams. Use the
+                                <a href="#task-checkpoint-system" class="property">task.checkpoint.system</a>
+                                property to configure which Kafka cluster to use for checkpoints.</dd>
+                        </dl>
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="task-commit-ms">task.commit.ms</td>
+                    <td class="default">60000</td>
+                    <td class="description">
+                        If <a href="#task-checkpoint-factory" class="property">task.checkpoint.factory</a> is
+                        configured, this property determines how often a checkpoint is written. The value is
+                        the time between checkpoints, in milliseconds. The frequency of checkpointing affects
+                        failure recovery: if a container fails unexpectedly (e.g. due to crash or machine failure)
+                        and is restarted, it resumes processing at the last checkpoint. Any messages processed
+                        since the last checkpoint on the failed container are processed again. Checkpointing
+                        more frequently reduces the number of messages that may be processed twice, but also
+                        uses more resources.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="task-command-class">task.command.class</td>
+                    <td class="default">org.apache.samza.job.<br>ShellCommandBuilder</td>
+                    <td class="description">
+                        The fully-qualified name of the Java class which determines the command line and environment
+                        variables for a <a href="../container/samza-container.html">container</a>. It must be a subclass of
+                        <a href="../api/javadocs/org/apache/samza/job/CommandBuilder.html">CommandBuilder</a>.
+                        Normally there is no need to change the default, with one exception: if you are using
+                        <code>LocalJobFactory</code> for <a href="#job-factory-class" class="property">job.factory.class</a>,
+                        and <code>task.command.class</code> is not set, the job is run using threads in the current JVM
+                        by default. Thus, any <a href="#task-opts" class="property">task.opts</a> you have configured
+                        will have no effect. If you want <code>LocalJobFactory</code> to use a subprocess rather than
+                        a thread-based executor, you must specify
+                        <code>task.command.class=org.apache.samza.job.ShellCommandBuilder</code> explicitly.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="task-opts">task.opts</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        Any JVM options to include in the command line when executing Samza containers. For example,
+                        this can be used to set the JVM heap size, to tune the garbage collector, or to enable
+                        <a href="/learn/tutorials/0.7.0/remote-debugging-samza.html">remote debugging</a>. Note
+                        there are some issues with the current implementation of <code>task.opts</code>:
+                        <ul>
+                            <li>If you set this property, the log configuration is disrupted. Please see
+                            <a href="https://issues.apache.org/jira/browse/SAMZA-109">SAMZA-109</a> for a workaround.</li>
+                            <li>When running with <code>LocalJobFactory</code>, you need to also set
+                            <a href="#task-command-class" class="property">task.command.class</a>, otherwise the
+                            <code>task.opts</code> setting won't take effect.</li>
+                        </ul>
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="task-execute">task.execute</td>
+                    <td class="default">bin/run-container.sh</td>
+                    <td class="description">
+                        The command that starts a Samza container. The script must be included in the
+                        <a href="packaging.html">job package</a>. There is usually no need to customize this.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="task-chooser-class">task.chooser.class</td>
+                    <td class="default">org.apache.samza.<br>system.chooser.<br>RoundRobinChooserFactory</td>
+                    <td class="description">
+                        This property can be optionally set to override the default
+                        <a href="../container/streams.html#messagechooser">message chooser</a>, which determines the
+                        order in which messages from multiple input streams are processed. The value of this
+                        property is the fully-qualified name of a Java class that implements
+                        <a href="../api/javadocs/org/apache/samza/system/chooser/MessageChooserFactory.html">MessageChooserFactory</a>.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="task-lifecycle-listener-class">task.lifecycle.listener.<br><span class="listener">listener-name</span>.class</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        Use this property to register a
+                        <a href="../container/event-loop.html#lifecycle-listeners">lifecycle listener</a>, which can receive
+                        a notification when a container starts up or shuts down, or when a message is processed.
+                        The value is the fully-qualified name of a Java class that implements
+                        <a href="../api/javadocs/org/apache/samza/task/TaskLifecycleListenerFactory.html">TaskLifecycleListenerFactory</a>.
+                        You can define multiple lifecycle listeners, each with a different <span class="listener">listener-name</span>,
+                        and reference them in <a href="#task-lifecycle-listeners" class="property">task.lifecycle.listeners</a>.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="task-lifecycle-listeners">task.lifecycle.listeners</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        If you have defined <a href="../container/event-loop.html#lifecycle-listeners">lifecycle listeners</a> with
+                        <a href="#task-lifecycle-listener-class" class="property">task.lifecycle.listener.*.class</a>,
+                        you need to list them here in order to enable them. The value of this property is a
+                        comma-separated list of <span class="listener">listener-name</span> tokens.
+                    </td>
+                </tr>
+
+                <tr>
+                    <th colspan="3" class="section" id="streams"><a href="../container/streams.html">Systems (input and output streams)</a></th>
+                </tr>
+
+                <tr>
+                    <td class="property" id="systems-samza-factory">systems.<span class="system">system-name</span>.<br>samza.factory</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        <strong>Required:</strong> The fully-qualified name of a Java class which provides a
+                        <em>system</em>. A system can provide input streams which you can consume in your Samza job,
+                        or output streams to which you can write, or both. The requirements on a system are very
+                        flexible &mdash; it may connect to a message broker, or read and write files, or use a database,
+                        or anything else. The class must implement
+                        <a href="../api/javadocs/org/apache/samza/system/SystemFactory.html">SystemFactory</a>.
+                        Samza ships with the following implementations:
+                        <dl>
+                            <dt><code>org.apache.samza.system.kafka.KafkaSystemFactory</code></dt>
+                            <dd>Connects to a cluster of <a href="http://kafka.apache.org/">Kafka</a> brokers, allows
+                                Kafka topics to be consumed as streams in Samza, allows messages to be published to
+                                Kafka topics, and allows Kafka to be used for checkpointing (see
+                                <a href="#task-checkpoint-factory" class="property">task.checkpoint.factory</a>).
+                                See also <a href="#kafka">configuration of a Kafka system</a>.</dd>
+                            <dt><code>org.apache.samza.system.filereader.FileReaderSystemFactory</code></dt>
+                            <dd>Reads data from a file on the local filesystem (the stream name is the path of the
+                                file to read). The file is read as ASCII, and treated as a stream of messages separated
+                                by newline (<code>\n</code>) characters. A task can consume each line of the file as
+                                a <code>java.lang.String</code> object. This system does not provide output streams.</dd>
+                        </dl>
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="systems-samza-key-serde">systems.<span class="system">system-name</span>.<br>samza.key.serde</td>
+                    <td class="default" rowspan="2"></td>
+                    <td class="description" rowspan="2">
+                        The <a href="../container/serialization.html">serde</a> which will be used to deserialize the
+                        <em>key</em> of messages on input streams, and to serialize the <em>key</em> of messages on
+                        output streams. This property can be defined either for an individual stream, or for all
+                        streams within a system (if both are defined, the stream-level definition takes precedence).
+                        The value of this property must be a <span class="serde">serde-name</span> that is registered
+                        with <a href="#serializers-registry-class" class="property">serializers.registry.*.class</a>.
+                        If this property is not set, messages are passed unmodified between the input stream consumer,
+                        the task and the output stream producer.
+                    </td>
+                </tr>
+                <tr>
+                    <td class="property">systems.<span class="system">system-name</span>.<br>streams.<span class="stream">stream-name</span>.<br>samza.key.serde</td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="systems-samza-msg-serde">systems.<span class="system">system-name</span>.<br>samza.msg.serde</td>
+                    <td class="default" rowspan="2"></td>
+                    <td class="description" rowspan="2">
+                        The <a href="../container/serialization.html">serde</a> which will be used to deserialize the
+                        <em>value</em> of messages on input streams, and to serialize the <em>value</em> of messages on
+                        output streams. This property can be defined either for an individual stream, or for all
+                        streams within a system (if both are defined, the stream-level definition takes precedence).
+                        The value of this property must be a <span class="serde">serde-name</span> that is registered
+                        with <a href="#serializers-registry-class" class="property">serializers.registry.*.class</a>.
+                        If this property is not set, messages are passed unmodified between the input stream consumer,
+                        the task and the output stream producer.
+                    </td>
+                </tr>
+                <tr>
+                    <td class="property">systems.<span class="system">system-name</span>.<br>streams.<span class="stream">stream-name</span>.<br>samza.msg.serde</td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="systems-samza-offset-default">systems.<span class="system">system-name</span>.<br>samza.offset.default</td>
+                    <td class="default" rowspan="2">upcoming</td>
+                    <td class="description" rowspan="2">
+                        If a container starts up without a <a href="../container/checkpointing.html">checkpoint</a>,
+                        this property determines where in the input stream we should start consuming. The value must be an
+                        <a href="../api/javadocs/org/apache/samza/system/SystemStreamMetadata.OffsetType.html">OffsetType</a>,
+                        one of the following:
+                        <dl>
+                            <dt><code>upcoming</code></dt>
+                            <dd>Start processing messages that are published after the job starts. Any messages published while 
+                                the job was not running are not processed.</dd>
+                            <dt><code>oldest</code></dt>
+                            <dd>Start processing at the oldest available message in the system, and
+                                <a href="reprocessing.html">reprocess</a> the entire available message history.</dd>
+                        </dl>
+                        This property can be defined either for an individual stream, or for all streams within a system
+                        (if both are defined, the stream-level definition takes precedence).
+                    </td>
+                </tr>
+                <tr>
+                    <td class="property">systems.<span class="system">system-name</span>.<br>streams.<span class="stream">stream-name</span>.<br>samza.offset.default</td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="systems-streams-samza-reset-offset">systems.<span class="system">system-name</span>.<br>streams.<span class="stream">stream-name</span>.<br>samza.reset.offset</td>
+                    <td>false</td>
+                    <td>
+                        If set to <code>true</code>, when a Samza container starts up, it ignores any
+                        <a href="../container/checkpointing.html">checkpointed offset</a> for this particular input
+                        stream. Its behavior is thus determined by the <code>samza.offset.default</code> setting.
+                        Note that the reset takes effect <em>every time a container is started</em>, which may be
+                        every time you restart your job, or more frequently if a container fails and is restarted
+                        by the framework.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="systems-streams-samza-priority">systems.<span class="system">system-name</span>.<br>streams.<span class="stream">stream-name</span>.<br>samza.priority</td>
+                    <td>-1</td>
+                    <td>
+                        If one or more streams have a priority set (any positive integer), they will be processed
+                        with <a href="../container/streams.html#prioritizing-input-streams">higher priority</a> than the other streams.
+                        You can set several streams to the same priority, or define multiple priority levels by
+                        assigning a higher number to the higher-priority streams. If a higher-priority stream has
+                        any messages available, they will always be processed first; messages from lower-priority
+                        streams are only processed when there are no new messages on higher-priority inputs.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="systems-streams-samza-bootstrap">systems.<span class="system">system-name</span>.<br>streams.<span class="stream">stream-name</span>.<br>samza.bootstrap</td>
+                    <td>false</td>
+                    <td>
+                        If set to <code>true</code>, this stream will be processed as a
+                        <a href="../container/streams.html#bootstrapping">bootstrap stream</a>. This means that every time
+                        a Samza container starts up, this stream will be fully consumed before messages from any
+                        other stream are processed.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="task-consumer-batch-size">task.consumer.batch.size</td>
+                    <td>1</td>
+                    <td>
+                        If set to a positive integer, the task will try to consume
+                        <a href="../container/streams.html#batching">batches</a> with the given number of messages
+                        from each input stream, rather than consuming round-robin from all the input streams on
+                        each individual message. Setting this property can improve performance in some cases.
+                    </td>
+                </tr>
+
+                <tr>
+                    <th colspan="3" class="section" id="serdes"><a href="../container/serialization.html">Serializers/Deserializers (Serdes)</a></th>
+                </tr>
+
+                <tr>
+                    <td class="property" id="serializers-registry-class">serializers.registry.<br><span class="serde">serde-name</span>.class</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        Use this property to register a <a href="../container/serialization.html">serializer/deserializer</a>,
+                        which defines a way of encoding application objects as an array of bytes (used for messages
+                        in streams, and for data in persistent storage). You can give a serde any
+                        <span class="serde">serde-name</span> you want, and reference that name in properties like
+                        <a href="#systems-samza-key-serde" class="property">systems.*.samza.key.serde</a>,
+                        <a href="#systems-samza-msg-serde" class="property">systems.*.samza.msg.serde</a>,
+                        <a href="#stores-key-serde" class="property">stores.*.key.serde</a> and
+                        <a href="#stores-msg-serde" class="property">stores.*.msg.serde</a>.
+                        The value of this property is the fully-qualified name of a Java class that implements
+                        <a href="../api/javadocs/org/apache/samza/serializers/SerdeFactory.html">SerdeFactory</a>.
+                        Samza ships with several serdes:
+                        <dl>
+                            <dt><code>org.apache.samza.serializers.ByteSerdeFactory</code></dt>
+                            <dd>A no-op serde which passes through the undecoded byte array.</dd>
+                            <dt><code>org.apache.samza.serializers.IntegerSerdeFactory</code></dt>
+                            <dd>Encodes <code>java.lang.Integer</code> objects as binary (4 bytes fixed-length big-endian encoding).</dd>
+                            <dt><code>org.apache.samza.serializers.StringSerdeFactory</code></dt>
+                            <dd>Encodes <code>java.lang.String</code> objects as UTF-8.</dd>
+                            <dt><code>org.apache.samza.serializers.JsonSerdeFactory</code></dt>
+                            <dd>Encodes nested structures of <code>java.util.Map</code>, <code>java.util.List</code> etc. as JSON.</dd>
+                            <dt><code>org.apache.samza.serializers.MetricsSnapshotSerdeFactory</code></dt>
+                            <dd>Encodes <code>org.apache.samza.metrics.reporter.MetricsSnapshot</code> objects (which are
+                                used for <a href="../container/metrics.html">reporting metrics</a>) as JSON.</dd>
+                            <dt><code>org.apache.samza.serializers.KafkaSerdeFactory</code></dt>
+                            <dd>Adapter which allows existing <code>kafka.serializer.Encoder</code> and
+                                <code>kafka.serializer.Decoder</code> implementations to be used as Samza serdes.
+                                Set serializers.registry.<span class="serde">serde-name</span>.encoder and
+                                serializers.registry.<span class="serde">serde-name</span>.decoder to the appropriate
+                                class names.</dd>
+                        </dl>
+                    </td>
+                </tr>
+
+                <tr>
+                    <th colspan="3" class="section" id="filesystem-checkpoints">
+                        Using the filesystem for checkpoints<br>
+                        <span class="subtitle">
+                            (This section applies if you have set
+                            <a href="#task-checkpoint-factory" class="property">task.checkpoint.factory</a>
+                            <code>= org.apache.samza.checkpoint.file.FileSystemCheckpointManagerFactory</code>)
+                        </span>
+                    </th>
+                </tr>
+
+                <tr>
+                    <td class="property" id="task-checkpoint-path">task.checkpoint.path</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        Required if you are using the filesystem for checkpoints. Set this to the path on your local filesystem
+                        where checkpoint files should be stored.
+                    </td>
+                </tr>
+
+                <tr>
+                    <th colspan="3" class="section" id="kafka">
+                        Using <a href="http://kafka.apache.org/">Kafka</a> for input streams, output streams and checkpoints<br>
+                        <span class="subtitle">
+                            (This section applies if you have set
+                            <a href="#systems-samza-factory" class="property">systems.*.samza.factory</a>
+                            <code>= org.apache.samza.system.kafka.KafkaSystemFactory</code>)
+                        </span>
+                    </th>
+                </tr>
+
+                <tr>
+                    <td class="property" id="systems-samza-consumer-zookeeper-connect">systems.<span class="system">system-name</span>.<br>consumer.zookeeper.connect</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        The hostname and port of one or more Zookeeper nodes where information about the
+                        Kafka cluster can be found. This is given as a comma-separated list of
+                        <code>hostname:port</code> pairs, such as
+                        <code>zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181</code>.
+                        If the cluster information is at some sub-path of the Zookeeper namespace, you need to
+                        include the path at the end of the list of hostnames, for example:
+                        <code>zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181/clusters/my-kafka</code>
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="systems-samza-consumer-auto-offset-reset">systems.<span class="system">system-name</span>.<br>consumer.auto.offset.reset</td>
+                    <td class="default">largest</td>
+                    <td class="description">
+                        This setting determines what happens if a consumer attempts to read an offset that is
+                        outside of the current valid range. This could happen if the topic does not exist, or
+                        if a checkpoint is older than the maximum message history retained by the brokers.
+                        This property is not to be confused with
+                        <a href="#systems-samza-offset-default">systems.*.samza.offset.default</a>,
+                        which determines what happens if there is no checkpoint. The following are valid
+                        values for <code>auto.offset.reset</code>:
+                        <dl>
+                            <dt><code>smallest</code></dt>
+                            <dd>Start consuming at the smallest (oldest) offset available on the broker
+                                (process as much message history as available).</dd>
+                            <dt><code>largest</code></dt>
+                            <dd>Start consuming at the largest (newest) offset available on the broker
+                                (skip any messages published while the job was not running).</dd>
+                            <dt>anything else</dt>
+                            <dd>Throw an exception and refuse to start up the job.</dd>
+                        </dl>
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="systems-samza-consumer">systems.<span class="system">system-name</span>.<br>consumer.*</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        Any <a href="http://kafka.apache.org/documentation.html#consumerconfigs">Kafka consumer configuration</a>
+                        can be included here. For example, to change the socket timeout, you can set
+                        systems.<span class="system">system-name</span>.consumer.socket.timeout.ms.
+                        (There is no need to configure <code>group.id</code> or <code>client.id</code>,
+                        as they are automatically configured by Samza. Also, there is no need to set
+                        <code>auto.commit.enable</code> because Samza has its own checkpointing mechanism.)
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="systems-samza-producer-metadata-broker-list">systems.<span class="system">system-name</span>.<br>producer.metadata.broker.list</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        A list of network endpoints where the Kafka brokers are running. This is given as
+                        a comma-separated list of <code>hostname:port</code> pairs, for example
+                        <code>kafka1.example.com:9092,kafka2.example.com:9092,kafka3.example.com:9092</code>.
+                        It's not necessary to list every single Kafka node in the cluster: Samza uses this
+                        property in order to discover which topics and partitions are hosted on which broker.
+                        This property is needed even if you are only consuming from Kafka, and not writing
+                        to it, because Samza uses it to discover metadata about streams being consumed.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="systems-samza-producer-producer-type">systems.<span class="system">system-name</span>.<br>producer.producer.type</td>
+                    <td class="default">sync</td>
+                    <td class="description">
+                        Controls whether messages emitted from a stream processor should be buffered before
+                        they are sent to Kafka. The options are:
+                        <dl>
+                            <dt><code>sync</code></dt>
+                            <dd>Any messages sent to output streams are synchronously flushed to the Kafka brokers
+                                before the next message from an input stream is processed.</dd>
+                            <dt><code>async</code></dt>
+                            <dd>Messages sent to output streams are buffered within the Samza container, and published
+                                to the Kafka brokers as a batch. This setting can increase throughput, but
+                                risks buffered messages being lost if a container abruptly fails. The maximum
+                                number of messages to buffer is controlled with
+                                systems.<span class="system">system-name</span>.producer.batch.num.messages
+                                and the maximum time (in milliseconds) to wait before flushing the buffer is set with
+                                systems.<span class="system">system-name</span>.producer.queue.buffering.max.ms.</dd>
+                        </dl>
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="systems-samza-producer">systems.<span class="system">system-name</span>.<br>producer.*</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        Any <a href="http://kafka.apache.org/documentation.html#producerconfigs">Kafka producer configuration</a>
+                        can be included here. For example, to change the request timeout, you can set
+                        systems.<span class="system">system-name</span>.producer.request.timeout.ms.
+                        (There is no need to configure <code>client.id</code> as it is automatically
+                        configured by Samza.)
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="systems-samza-fetch-threshold">systems.<span class="system">system-name</span>.<br>samza.fetch.threshold</td>
+                    <td class="default">50000</td>
+                    <td class="description">
+                        When consuming streams from Kafka, a Samza container maintains an in-memory buffer
+                        for incoming messages in order to increase throughput (the stream task can continue
+                        processing buffered messages while new messages are fetched from Kafka). This
+                        parameter determines the number of messages we aim to buffer across all stream
+                        partitions consumed by a container. For example, if a container consumes 50 partitions,
+                        it will try to buffer 1000 messages per partition by default. When the number of
+                        buffered messages falls below that threshold, Samza fetches more messages from the
+                        Kafka broker to replenish the buffer. Increasing this parameter can increase a job's
+                        processing throughput, but also increases the amount of memory used.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="task-checkpoint-system">task.checkpoint.system</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        This property is required if you are using Kafka for checkpoints
+                        (<a href="#task-checkpoint-factory" class="property">task.checkpoint.factory</a>
+                        <code>= org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory</code>).
+                        You must set it to the <span class="system">system-name</span> of a Kafka system. The stream
+                        name (topic name) within that system is automatically determined from the job name and ID:
+                        <code>__samza_checkpoint_${<a href="#job-name" class="property">job.name</a>}_${<a href="#job-id" class="property">job.id</a>}</code>
+                        (with underscores in the job name and ID replaced by hyphens).
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="task-checkpoint-replication-factor">task.checkpoint.<br>replication.factor</td>
+                    <td class="default">3</td>
+                    <td class="description">
+                        If you are using Kafka for checkpoints, this is the number of Kafka nodes to which you want the
+                        checkpoint topic replicated for durability.
+                    </td>
+                </tr>
+
+                <tr>
+                    <th colspan="3" class="section" id="regex-rewriter">
+                        Consuming all Kafka topics matching a regular expression<br>
+                        <span class="subtitle">
+                            (This section applies if you have set
+                            <a href="#job-config-rewriter-class" class="property">job.config.rewriter.*.class</a>
+                            <code>= org.apache.samza.config.RegExTopicGenerator</code>)
+                        </span>
+                    </th>
+                </tr>
+
+                <tr>
+                    <td class="property" id="job-config-rewriter-system">job.config.rewriter.<br><span class="rewriter">rewriter-name</span>.system</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        Set this property to the <span class="system">system-name</span> of the Kafka system
+                        from which you want to consume all matching topics.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="job-config-rewriter-regex">job.config.rewriter.<br><span class="rewriter">rewriter-name</span>.regex</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        A regular expression specifying which topics you want to consume within the Kafka system
+                        <a href="#job-config-rewriter-system" class="property">job.config.rewriter.*.system</a>.
+                        Any topics matched by this regular expression will be consumed <em>in addition to</em> any
+                        topics you specify with <a href="#task-inputs" class="property">task.inputs</a>.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="job-config-rewriter-config">job.config.rewriter.<br><span class="rewriter">rewriter-name</span>.config.*</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        Any properties specified within this namespace are applied to the configuration of streams
+                        that match the regex in
+                        <a href="#job-config-rewriter-regex" class="property">job.config.rewriter.*.regex</a>.
+                        For example, you can set <code>job.config.rewriter.*.config.samza.msg.serde</code> to configure
+                        the deserializer for messages in the matching streams, which is equivalent to setting
+                        <a href="#systems-samza-msg-serde" class="property">systems.*.streams.*.samza.msg.serde</a>
+                        for each topic that matches the regex.
+                    </td>
+                </tr>
+
+                <tr>
+                    <th colspan="3" class="section" id="state"><a href="../container/state-management.html">Storage and State Management</a></th>
+                </tr>
+
+                <tr>
+                    <td class="property" id="stores-factory">stores.<span class="store">store-name</span>.factory</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        This property defines a store, Samza's mechanism for efficient
+                        <a href="../container/state-management.html">stateful stream processing</a>. You can give a
+                        store any <span class="store">store-name</span>, and use that name to get a reference to the
+                        store in your stream task (call
+                        <a href="../api/javadocs/org/apache/samza/task/TaskContext.html#getStore(java.lang.String)">TaskContext.getStore()</a>
+                        in your task's
+                        <a href="../api/javadocs/org/apache/samza/task/InitableTask.html#init(org.apache.samza.config.Config, org.apache.samza.task.TaskContext)">init()</a>
+                        method). The value of this property is the fully-qualified name of a Java class that implements
+                        <a href="../api/javadocs/org/apache/samza/storage/StorageEngineFactory.html">StorageEngineFactory</a>.
+                        Samza currently ships with one storage engine implementation:
+                        <dl>
+                            <dt><code>org.apache.samza.storage.kv.KeyValueStorageEngineFactory</code></dt>
+                            <dd>An on-disk storage engine with a key-value interface, implemented using
+                                <a href="https://code.google.com/p/leveldb/">LevelDB</a>. It supports fast random-access
+                                reads and writes, as well as range queries on keys. LevelDB can be configured with
+                                various <a href="#keyvalue">additional tuning parameters</a>.</dd>
+                        </dl>
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="stores-key-serde">stores.<span class="store">store-name</span>.key.serde</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        If the storage engine expects keys in the store to be simple byte arrays, this
+                        <a href="../container/serialization.html">serde</a> allows the stream task to access the
+                        store using another object type as key. The value of this property must be a
+                        <span class="serde">serde-name</span> that is registered with
+                        <a href="#serializers-registry-class" class="property">serializers.registry.*.class</a>.
+                        If this property is not set, keys are passed unmodified to the storage engine
+                        (and the <a href="#stores-changelog">changelog stream</a>, if appropriate).
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="stores-msg-serde">stores.<span class="store">store-name</span>.msg.serde</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        If the storage engine expects values in the store to be simple byte arrays, this
+                        <a href="../container/serialization.html">serde</a> allows the stream task to access the
+                        store using another object type as value. The value of this property must be a
+                        <span class="serde">serde-name</span> that is registered with
+                        <a href="#serializers-registry-class" class="property">serializers.registry.*.class</a>.
+                        If this property is not set, values are passed unmodified to the storage engine
+                        (and the <a href="#stores-changelog">changelog stream</a>, if appropriate).
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="stores-changelog">stores.<span class="store">store-name</span>.changelog</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        Samza stores are local to a container. If the container fails, the contents of the
+                        store are lost. To prevent loss of data, you need to set this property to configure
+                        a changelog stream: Samza then ensures that writes to the store are replicated to
+                        this stream, and the store is restored from this stream after a failure. The value
+                        of this property is given in the form
+                        <span class="system">system-name</span>.<span class="stream">stream-name</span>.
+                        Any output stream can be used as changelog, but you must ensure that only one job
+                        ever writes to a given changelog stream (each instance of a job and each store
+                        needs its own changelog stream).
+                    </td>
+                </tr>
+
+                <tr>
+                    <th colspan="3" class="section" id="keyvalue">
+                        Using LevelDB for key-value storage<br>
+                        <span class="subtitle">
+                            (This section applies if you have set
+                            <a href="#stores-factory" class="property">stores.*.factory</a>
+                            <code>= org.apache.samza.storage.kv.KeyValueStorageEngineFactory</code>)
+                        </span>
+                    </th>
+                </tr>
+
+                <tr>
+                    <td class="property" id="stores-write-batch-size">stores.<span class="store">store-name</span>.<br>write.batch.size</td>
+                    <td class="default">500</td>
+                    <td class="description">
+                        For better write performance, the storage engine buffers writes and applies them
+                        to the underlying store in a batch. If the same key is written multiple times
+                        in quick succession, this buffer also deduplicates writes to the same key. This
+                        property is set to the number of key/value pairs that should be kept in this
+                        in-memory buffer, per task instance. The number cannot be greater than
+                        <a href="#stores-object-cache-size" class="property">stores.*.object.cache.size</a>.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="stores-object-cache-size">stores.<span class="store">store-name</span>.<br>object.cache.size</td>
+                    <td class="default">1000</td>
+                    <td class="description">
+                        Samza maintains an additional cache in front of LevelDB for frequently-accessed
+                        objects. This cache contains deserialized objects (avoiding the deserialization
+                        overhead on cache hits), in contrast to the LevelDB block cache
+                        (<a href="#stores-container-cache-size-bytes" class="property">stores.*.container.cache.size.bytes</a>),
+                        which caches serialized objects. This property determines the number of objects
+                        to keep in Samza's cache, per task instance. This same cache is also used for
+                        write buffering (see <a href="#stores-write-batch-size" class="property">stores.*.write.batch.size</a>).
+                        A value of 0 disables all caching and batching.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="stores-container-cache-size-bytes">stores.<span class="store">store-name</span>.container.<br>cache.size.bytes</td>
+                    <td class="default">104857600</td>
+                    <td class="description">
+                        The size of LevelDB's block cache in bytes, per container. If there are several
+                        task instances within one container, each is given a proportional share of this cache.
+                        Note that this is an off-heap memory allocation, so the container's total memory use
+                        is the maximum JVM heap size <em>plus</em> the size of this cache.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="stores-container-write-buffer-size-bytes">stores.<span class="store">store-name</span>.container.<br>write.buffer.size.bytes</td>
+                    <td class="default">33554432</td>
+                    <td class="description">
+                        The amount of memory (in bytes) that LevelDB uses for buffering writes before they are
+                        written to disk, per container. If there are several task instances within one
+                        container, each is given a proportional share of this buffer. This setting also
+                        determines the size of LevelDB's segment files.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="stores-compaction-delete-threshold">stores.<span class="store">store-name</span>.<br>compaction.delete.threshold</td>
+                    <td class="default">-1</td>
+                    <td class="description">
+                        Setting this property forces a LevelDB compaction to be performed after a certain
+                        number of keys have been deleted from the store. This is used to work around
+                        <a href="https://issues.apache.org/jira/browse/SAMZA-254">performance issues</a>
+                        in certain workloads.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="stores-leveldb-compression">stores.<span class="store">store-name</span>.<br>leveldb.compression</td>
+                    <td class="default">snappy</td>
+                    <td class="description">
+                        This property controls whether LevelDB should compress data on disk and in the
+                        block cache. The following values are valid:
+                        <dl>
+                            <dt><code>snappy</code></dt>
+                            <dd>Compress data using the <a href="https://code.google.com/p/snappy/">Snappy</a> codec.</dd>
+                            <dt><code>none</code></dt>
+                            <dd>Do not compress data.</dd>
+                        </dl>
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="stores-leveldb-block-size-bytes">stores.<span class="store">store-name</span>.<br>leveldb.block.size.bytes</td>
+                    <td class="default">4096</td>
+                    <td class="description">
+                        If compression is enabled, LevelDB groups approximately this many uncompressed bytes
+                        into one compressed block. You probably don't need to change this property.
+                    </td>
+                </tr>
+
+                <tr>
+                    <th colspan="3" class="section" id="yarn">
+                        Running your job on a <a href="../jobs/yarn-jobs.html">YARN</a> cluster<br>
+                        <span class="subtitle">
+                            (This section applies if you have set
+                            <a href="#job-factory-class" class="property">job.factory.class</a>
+                            <code>= org.apache.samza.job.yarn.YarnJobFactory</code>)
+                        </span>
+                    </th>
+                </tr>
+
+                <tr>
+                    <td class="property" id="yarn-package-path">yarn.package.path</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        <strong>Required for YARN jobs:</strong> The URL from which the job package can
+                        be downloaded, for example a <code>http://</code> or <code>hdfs://</code> URL.
+                        The job package is a .tar.gz file with a
+                        <a href="../jobs/packaging.html">specific directory structure</a>.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="yarn-container-count">yarn.container.count</td>
+                    <td class="default">1</td>
+                    <td class="description">
+                        The number of YARN containers to request for running your job. This is the main parameter
+                        for controlling the scale (allocated computing resources) of your job: to increase the
+                        parallelism of processing, you need to increase the number of containers. The minimum is one
+                        container, and the maximum number of containers is the number of task instances (usually the
+                        <a href="../container/samza-container.html#tasks-and-partitions">number of input stream partitions</a>).
+                        Task instances are evenly distributed across the number of containers that you specify.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="yarn-container-memory-mb">yarn.container.memory.mb</td>
+                    <td class="default">1024</td>
+                    <td class="description">
+                        How much memory, in megabytes, to request from YARN per container of your job. Along with
+                        <a href="#yarn-container-cpu-cores" class="property">yarn.container.cpu.cores</a>, this
+                        property determines how many containers YARN will run on one machine. If the container
+                        exceeds this limit, YARN will kill it, so it is important that the container's actual
+                        memory use remains below the limit. The amount of memory used is normally the JVM heap
+                        size (configured with <a href="#task-opts" class="property">task.opts</a>), plus the
+                        size of any off-heap memory allocation (for example
+                        <a href="#stores-container-cache-size-bytes" class="property">stores.*.container.cache.size.bytes</a>),
+                        plus a safety margin to allow for JVM overheads.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="yarn-container-cpu-cores">yarn.container.cpu.cores</td>
+                    <td class="default">1</td>
+                    <td class="description">
+                        The number of CPU cores to request from YARN per container of your job. Each node in the
+                        YARN cluster has a certain number of CPU cores available, so this number (along with
+                        <a href="#yarn-container-memory-mb" class="property">yarn.container.memory.mb</a>)
+                        determines how many containers can be run on one machine. Samza is
+                        <a href="../container/event-loop.html">single-threaded</a> and designed to run on one
+                        CPU core, so you shouldn't normally need to change this property.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="yarn-container-retry-count">yarn.container.<br>retry.count</td>
+                    <td class="default">8</td>
+                    <td class="description">
+                        If a container fails, it is automatically restarted by YARN. However, if a container keeps
+                        failing shortly after startup, that indicates a deeper problem, so we should kill the job
+                        rather than retrying indefinitely. This property determines the maximum number of times we are
+                        willing to restart a failed container in quick succession (the time period is configured with
+                        <a href="#yarn-container-retry-window-ms" class="property">yarn.container.retry.window.ms</a>).
+                        Each container in the job is counted separately. If this property is set to 0, any failed
+                        container immediately causes the whole job to fail. If it is set to a negative number, there
+                        is no limit on the number of retries.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="yarn-container-retry-window-ms">yarn.container.<br>retry.window.ms</td>
+                    <td class="default">300000</td>
+                    <td class="description">
+                        This property determines how frequently a container is allowed to fail before we give up and
+                        fail the job. If the same container has failed more than
+                        <a href="#yarn-container-retry-count" class="property">yarn.container.retry.count</a>
+                        times, and the time between failures was less than this property
+                        <code>yarn.container.retry.window.ms</code> (in milliseconds), then we fail the job.
+                        There is no limit to the number of times we will restart a container if the time between
+                        failures is greater than <code>yarn.container.retry.window.ms</code>.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="yarn-am-container-memory-mb">yarn.am.container.<br>memory.mb</td>
+                    <td class="default">1024</td>
+                    <td class="description">
+                        Each Samza job has one special container, the
+                        <a href="../yarn/application-master.html">ApplicationMaster</a> (AM), which manages the
+                        execution of the job. This property determines how much memory, in megabytes, to request
+                        from YARN for running the ApplicationMaster.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="yarn-am-opts">yarn.am.opts</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        Any JVM options to include in the command line when executing the Samza
+                        <a href="../yarn/application-master.html">ApplicationMaster</a>. For example, this can be
+                        used to set the JVM heap size, to tune the garbage collector, or to enable remote debugging.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="yarn-am-poll-interval-ms">yarn.am.poll.interval.ms</td>
+                    <td class="default">1000</td>
+                    <td class="description">
+                        THe Samza ApplicationMaster sends regular heartbeats to the YARN ResourceManager
+                        to confirm that it is alive. This property determines the time (in milliseconds)
+                        between heartbeats.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="yarn-am-jmx-enabled">yarn.am.jmx.enabled</td>
+                    <td class="default">true</td>
+                    <td class="description">
+                        Determines whether a JMX server should be started on this job's YARN ApplicationMaster
+                        (<code>true</code> or <code>false</code>).
+                    </td>
+                </tr>
+
+                <tr>
+                    <th colspan="3" class="section" id="metrics"><a href="../container/metrics.html">Metrics</a></th>
+                </tr>
+
+                <tr>
+                    <td class="property" id="metrics-reporter-class">metrics.reporter.<br><span class="reporter">reporter-name</span>.class</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        Samza automatically tracks various metrics which are useful for monitoring the health
+                        of a job, and you can also track <a href="../container/metrics.html">your own metrics</a>.
+                        With this property, you can define any number of <em>metrics reporters</em> which send
+                        the metrics to a system of your choice (for graphing, alerting etc). You give each reporter
+                        an arbitrary <span class="reporter">reporter-name</span>. To enable the reporter, you need
+                        to reference the <span class="reporter">reporter-name</span> in
+                        <a href="#metrics-reporters" class="property">metrics.reporters</a>.
+                        The value of this property is the fully-qualified name of a Java class that implements
+                        <a href="../api/javadocs/org/apache/samza/metrics/MetricsReporterFactory.html">MetricsReporterFactory</a>.
+                        Samza ships with these implementations by default:
+                        <dl>
+                            <dt><code>org.apache.samza.metrics.reporter.JmxReporterFactory</code></dt>
+                            <dd>With this reporter, every container exposes its own metrics as JMX MBeans. The JMX
+                                server is started on a <a href="../container/jmx.html">random port</a> to avoid
+                                collisions between containers running on the same machine.</dd>
+                            <dt><code>org.apache.samza.metrics.reporter.MetricsSnapshotReporterFactory</code></dt>
+                            <dd>This reporter sends the latest values of all metrics as messages to an output
+                                stream once per minute. The output stream is configured with
+                                <a href="#metrics-reporter-stream" class="property">metrics.reporter.*.stream</a>
+                                and it can use any system supported by Samza.</dd>
+                        </dl>
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="metrics-reporters">metrics.reporters</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        If you have defined any metrics reporters with
+                        <a href="#metrics-reporter-class" class="property">metrics.reporter.*.class</a>, you
+                        need to list them here in order to enable them. The value of this property is a
+                        comma-separated list of <span class="reporter">reporter-name</span> tokens.
+                    </td>
+                </tr>
+
+                <tr>
+                    <td class="property" id="metrics-reporter-stream">metrics.reporter.<br><span class="reporter">reporter-name</span>.stream</td>
+                    <td class="default"></td>
+                    <td class="description">
+                        If you have registered the metrics reporter
+                        <a href="#metrics-reporter-class" class="property">metrics.reporter.*.class</a>
+                        <code>= org.apache.samza.metrics.reporter.MetricsSnapshotReporterFactory</code>,
+                        you need to set this property to configure the output stream to which the metrics data
+                        should be sent. The stream is given in the form
+                        <span class="system">system-name</span>.<span class="stream">stream-name</span>,
+                        and the system must be defined in the job configuration. It's fine for many different jobs
+                        to publish their metrics to the same metrics stream. Samza defines a simple
+                        <a href="../container/metrics.html">JSON encoding</a> for metrics; in order to use this
+                        encoding, you also need to configure a serde for the metrics stream:
+                        <ul>
+                            <li><a href="#systems-samza-msg-serde" class="property">systems.*.streams.*.samza.msg.serde</a>
+                                <code>= metrics-serde</code> (replacing the asterisks with the
+                                <span class="system">system-name</span> and <span class="stream">stream-name</span>
+                                of the metrics stream)</li>
+                            <li><a href="#serializers-registry-class" class="property">serializers.registry.metrics-serde.class</a>
+                                <code>= org.apache.samza.serializers.MetricsSnapshotSerdeFactory</code>
+                                (registering the serde under a <span class="serde">serde-name</span> of
+                                <code>metrics-serde</code>)</li>
+                        </ul>
+                    </td>
+                </tr>
+            </tbody>
+        </table>
+    </body>
 </html>

Modified: incubator/samza/site/learn/documentation/0.7.0/jobs/configuration.html
URL: http://svn.apache.org/viewvc/incubator/samza/site/learn/documentation/0.7.0/jobs/configuration.html?rev=1609232&r1=1609231&r2=1609232&view=diff
==============================================================================
--- incubator/samza/site/learn/documentation/0.7.0/jobs/configuration.html (original)
+++ incubator/samza/site/learn/documentation/0.7.0/jobs/configuration.html Wed Jul  9 16:37:01 2014
@@ -23,6 +23,7 @@
     <link href="/css/bootstrap.min.css" rel="stylesheet"/>
     <link href="/css/font-awesome.min.css" rel="stylesheet"/>
     <link href="/css/main.css" rel="stylesheet"/>
+    <link href="/css/syntax.css" rel="stylesheet"/>
     <link rel="icon" type="image/png" href="/img/samza-icon.png">
   </head>
   <body>
@@ -123,23 +124,24 @@
 -->
 
 <p>All Samza jobs have a configuration file that defines the job. A very basic configuration file looks like this:</p>
-<div class="highlight"><pre><code class="language-text" data-lang="text"># Job
-job.factory.class=samza.job.local.LocalJobFactory
-job.name=hello-world
-
-# Task
-task.class=samza.task.example.MyJavaStreamerTask
-task.inputs=example-system.example-stream
-
-# Serializers
-serializers.registry.json.class=org.apache.samza.serializers.JsonSerdeFactory
-serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory
-
-# Systems
-systems.example-system.samza.factory=samza.stream.example.ExampleConsumerFactory
-systems.example-system.samza.key.serde=string
-systems.example-system.samza.msg.serde=json
-</code></pre></div>
+
+<div class="highlight"><pre><code class="language-jproperties" data-lang="jproperties"><span class="c"># Job</span>
+<span class="na">job.factory.class</span><span class="o">=</span><span class="s">samza.job.local.LocalJobFactory</span>
+<span class="na">job.name</span><span class="o">=</span><span class="s">hello-world</span>
+
+<span class="c"># Task</span>
+<span class="na">task.class</span><span class="o">=</span><span class="s">samza.task.example.MyJavaStreamerTask</span>
+<span class="na">task.inputs</span><span class="o">=</span><span class="s">example-system.example-stream</span>
+
+<span class="c"># Serializers</span>
+<span class="na">serializers.registry.json.class</span><span class="o">=</span><span class="s">org.apache.samza.serializers.JsonSerdeFactory</span>
+<span class="na">serializers.registry.string.class</span><span class="o">=</span><span class="s">org.apache.samza.serializers.StringSerdeFactory</span>
+
+<span class="c"># Systems</span>
+<span class="na">systems.example-system.samza.factory</span><span class="o">=</span><span class="s">samza.stream.example.ExampleConsumerFactory</span>
+<span class="na">systems.example-system.samza.key.serde</span><span class="o">=</span><span class="s">string</span>
+<span class="na">systems.example-system.samza.msg.serde</span><span class="o">=</span><span class="s">json</span></code></pre></div>
+
 <p>There are four major sections to a configuration file:</p>
 
 <ol>
@@ -154,10 +156,10 @@ systems.example-system.samza.msg.serde=j
 <p>Configuration keys that absolutely must be defined for a Samza job are:</p>
 
 <ul>
-<li>job.factory.class</li>
-<li>job.name</li>
-<li>task.class</li>
-<li>task.inputs</li>
+<li><code>job.factory.class</code></li>
+<li><code>job.name</code></li>
+<li><code>task.class</code></li>
+<li><code>task.inputs</code></li>
 </ul>
 
 <h3 id="configuration-keys">Configuration Keys</h3>