You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flume.apache.org by de...@apache.org on 2017/10/04 16:41:51 UTC

svn commit: r1019096 [4/5] - in /websites/production/flume/content/releases/content/1.8.0: ./ FlumeDeveloperGuide.html FlumeDeveloperGuide.pdf FlumeUserGuide.html FlumeUserGuide.pdf

Added: websites/production/flume/content/releases/content/1.8.0/FlumeUserGuide.html
==============================================================================
--- websites/production/flume/content/releases/content/1.8.0/FlumeUserGuide.html (added)
+++ websites/production/flume/content/releases/content/1.8.0/FlumeUserGuide.html Wed Oct  4 16:41:51 2017
@@ -0,0 +1,7507 @@
+
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    
+    <title>Flume 1.8.0 User Guide &mdash; Apache Flume</title>
+    
+    <link rel="stylesheet" href="_static/flume.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    
+    <script type="text/javascript">
+      var DOCUMENTATION_OPTIONS = {
+        URL_ROOT:    '',
+        VERSION:     '',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <link rel="top" title="Apache Flume" href="index.html" />
+    <link rel="up" title="Documentation" href="documentation.html" />
+    <link rel="next" title="Flume 1.8.0 Developer Guide" href="FlumeDeveloperGuide.html" />
+    <link rel="prev" title="Documentation" href="documentation.html" /> 
+  </head>
+  <body>
+<div class="header">
+  <table width="100%" border="0">
+    <tr>
+      <td width="10%">
+        <div class="logo">
+          <a href="index.html">
+            <img class="logo" src="_static/flume-logo.png" alt="Logo"/>
+        </div>
+      </td>
+      <td width="2%">
+          <span class="trademark">&trade;</span>
+      </td>
+      <td width="68%" align="center" class="pageTitle">Apache Flume<sup><span class="trademark">&trade;</span></sup>
+      </td>
+      <td width="20%">
+          <a href="http://www.apache.org">
+            <img src="_static/feather-small.png" alt="Apache Software Foundation" height="70"/>
+          </a>
+      </td>
+    </tr>
+  </table>
+</div>
+  
+
+    <div class="document">
+      <div class="documentwrapper">
+        <div class="bodywrapper">
+          <div class="body">
+            
+  <div class="section" id="flume-1-8-0-user-guide">
+<h1>Flume 1.8.0 User Guide<a class="headerlink" href="#flume-1-8-0-user-guide" title="Permalink to this headline">¶</a></h1>
+<div class="section" id="introduction">
+<h2>Introduction<a class="headerlink" href="#introduction" title="Permalink to this headline">¶</a></h2>
+<div class="section" id="overview">
+<h3>Overview<a class="headerlink" href="#overview" title="Permalink to this headline">¶</a></h3>
+<p>Apache Flume is a distributed, reliable, and available system for efficiently
+collecting, aggregating and moving large amounts of log data from many
+different sources to a centralized data store.</p>
+<p>The use of Apache Flume is not only restricted to log data aggregation.
+Since data sources are customizable, Flume can be used to transport massive quantities
+of event data including but not limited to network traffic data, social-media-generated data,
+email messages and pretty much any data source possible.</p>
+<p>Apache Flume is a top level project at the Apache Software Foundation.</p>
+<p>There are currently two release code lines available, versions 0.9.x and 1.x.</p>
+<p>Documentation for the 0.9.x track is available at
+<a class="reference external" href="http://archive.cloudera.com/cdh/3/flume/UserGuide/">the Flume 0.9.x User Guide</a>.</p>
+<p>This documentation applies to the 1.4.x track.</p>
+<p>New and existing users are encouraged to use the 1.x releases so as to
+leverage the performance improvements and configuration flexibilities available
+in the latest architecture.</p>
+</div>
+<div class="section" id="system-requirements">
+<h3>System Requirements<a class="headerlink" href="#system-requirements" title="Permalink to this headline">¶</a></h3>
+<ol class="arabic simple">
+<li>Java Runtime Environment - Java 1.8 or later</li>
+<li>Memory - Sufficient memory for configurations used by sources, channels or sinks</li>
+<li>Disk Space - Sufficient disk space for configurations used by channels or sinks</li>
+<li>Directory Permissions - Read/Write permissions for directories used by agent</li>
+</ol>
+</div>
+<div class="section" id="architecture">
+<h3>Architecture<a class="headerlink" href="#architecture" title="Permalink to this headline">¶</a></h3>
+<div class="section" id="data-flow-model">
+<h4>Data flow model<a class="headerlink" href="#data-flow-model" title="Permalink to this headline">¶</a></h4>
+<p>A Flume event is defined as a unit of data flow having a byte payload and an
+optional set of string attributes. A Flume agent is a (JVM) process that hosts
+the components through which events flow from an external source to the next
+destination (hop).</p>
+<div class="figure align-center">
+<img alt="Agent component diagram" src="_images/UserGuide_image00.png" />
+</div>
+<p>A Flume source consumes events delivered to it by an external source like a web
+server. The external source sends events to Flume in a format that is
+recognized by the target Flume source. For example, an Avro Flume source can be
+used to receive Avro events from Avro clients or other Flume agents in the flow
+that send events from an Avro sink. A similar flow can be defined using
+a Thrift Flume Source to receive events from a Thrift Sink or a Flume
+Thrift Rpc Client or Thrift clients written in any language generated from
+the Flume thrift protocol.When a Flume source receives an event, it
+stores it into one or more channels. The channel is a passive store that keeps
+the event until it&#8217;s consumed by a Flume sink. The file channel is one example
+&#8211; it is backed by the local filesystem. The sink removes the event
+from the channel and puts it into an external repository like HDFS (via Flume
+HDFS sink) or forwards it to the Flume source of the next Flume agent (next
+hop) in the flow. The source and sink within the given agent run asynchronously
+with the events staged in the channel.</p>
+</div>
+<div class="section" id="complex-flows">
+<h4>Complex flows<a class="headerlink" href="#complex-flows" title="Permalink to this headline">¶</a></h4>
+<p>Flume allows a user to build multi-hop flows where events travel through
+multiple agents before reaching the final destination. It also allows fan-in
+and fan-out flows, contextual routing and backup routes (fail-over) for failed
+hops.</p>
+</div>
+<div class="section" id="reliability">
+<h4>Reliability<a class="headerlink" href="#reliability" title="Permalink to this headline">¶</a></h4>
+<p>The events are staged in a channel on each agent. The events are then delivered
+to the next agent or terminal repository (like HDFS) in the flow. The events
+are removed from a channel only after they are stored in the channel of next
+agent or in the terminal repository. This is a how the single-hop message
+delivery semantics in Flume provide end-to-end reliability of the flow.</p>
+<p>Flume uses a transactional approach to guarantee the reliable delivery of the
+events. The sources and sinks encapsulate in a transaction the
+storage/retrieval, respectively, of the events placed in or provided by a
+transaction provided by the channel.  This ensures that the set of events are
+reliably passed from point to point in the flow. In the case of a multi-hop
+flow, the sink from the previous hop and the source from the next hop both have
+their transactions running to ensure that the data is safely stored in the
+channel of the next hop.</p>
+</div>
+<div class="section" id="recoverability">
+<h4>Recoverability<a class="headerlink" href="#recoverability" title="Permalink to this headline">¶</a></h4>
+<p>The events are staged in the channel, which manages recovery from failure.
+Flume supports a durable file channel which is backed by the local file system.
+There&#8217;s also a memory channel which simply stores the events in an in-memory
+queue, which is faster but any events still left in the memory channel when an
+agent process dies can&#8217;t be recovered.</p>
+</div>
+</div>
+</div>
+<div class="section" id="setup">
+<h2>Setup<a class="headerlink" href="#setup" title="Permalink to this headline">¶</a></h2>
+<div class="section" id="setting-up-an-agent">
+<h3>Setting up an agent<a class="headerlink" href="#setting-up-an-agent" title="Permalink to this headline">¶</a></h3>
+<p>Flume agent configuration is stored in a local configuration file.  This is a
+text file that follows the Java properties file format.
+Configurations for one or more agents can be specified in the same
+configuration file. The configuration file includes properties of each source,
+sink and channel in an agent and how they are wired together to form data
+flows.</p>
+<div class="section" id="configuring-individual-components">
+<h4>Configuring individual components<a class="headerlink" href="#configuring-individual-components" title="Permalink to this headline">¶</a></h4>
+<p>Each component (source, sink or channel) in the flow has a name, type, and set
+of properties that are specific to the type and instantiation. For example, an
+Avro source needs a hostname (or IP address) and a port number to receive data
+from. A memory channel can have max queue size (&#8220;capacity&#8221;), and an HDFS sink
+needs to know the file system URI, path to create files, frequency of file
+rotation (&#8220;hdfs.rollInterval&#8221;) etc. All such attributes of a component needs to
+be set in the properties file of the hosting Flume agent.</p>
+</div>
+<div class="section" id="wiring-the-pieces-together">
+<h4>Wiring the pieces together<a class="headerlink" href="#wiring-the-pieces-together" title="Permalink to this headline">¶</a></h4>
+<p>The agent needs to know what individual components to load and how they are
+connected in order to constitute the flow. This is done by listing the names of
+each of the sources, sinks and channels in the agent, and then specifying the
+connecting channel for each sink and source. For example, an agent flows events
+from an Avro source called avroWeb to HDFS sink hdfs-cluster1 via a file
+channel called file-channel. The configuration file will contain names of these
+components and file-channel as a shared channel for both avroWeb source and
+hdfs-cluster1 sink.</p>
+</div>
+<div class="section" id="starting-an-agent">
+<h4>Starting an agent<a class="headerlink" href="#starting-an-agent" title="Permalink to this headline">¶</a></h4>
+<p>An agent is started using a shell script called flume-ng which is located in
+the bin directory of the Flume distribution. You need to specify the agent
+name, the config directory, and the config file on the command line:</p>
+<div class="highlight-none"><div class="highlight"><pre>$ bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template
+</pre></div>
+</div>
+<p>Now the agent will start running source and sinks configured in the given
+properties file.</p>
+</div>
+<div class="section" id="a-simple-example">
+<h4>A simple example<a class="headerlink" href="#a-simple-example" title="Permalink to this headline">¶</a></h4>
+<p>Here, we give an example configuration file, describing a single-node Flume deployment.
+This configuration lets a user generate events and subsequently logs them to the console.</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="c"># example.conf: A single-node Flume configuration</span>
+
+<span class="c"># Name the components on this agent</span>
+<span class="na">a1.sources</span> <span class="o">=</span> <span class="s">r1</span>
+<span class="na">a1.sinks</span> <span class="o">=</span> <span class="s">k1</span>
+<span class="na">a1.channels</span> <span class="o">=</span> <span class="s">c1</span>
+
+<span class="c"># Describe/configure the source</span>
+<span class="na">a1.sources.r1.type</span> <span class="o">=</span> <span class="s">netcat</span>
+<span class="na">a1.sources.r1.bind</span> <span class="o">=</span> <span class="s">localhost</span>
+<span class="na">a1.sources.r1.port</span> <span class="o">=</span> <span class="s">44444</span>
+
+<span class="c"># Describe the sink</span>
+<span class="na">a1.sinks.k1.type</span> <span class="o">=</span> <span class="s">logger</span>
+
+<span class="c"># Use a channel which buffers events in memory</span>
+<span class="na">a1.channels.c1.type</span> <span class="o">=</span> <span class="s">memory</span>
+<span class="na">a1.channels.c1.capacity</span> <span class="o">=</span> <span class="s">1000</span>
+<span class="na">a1.channels.c1.transactionCapacity</span> <span class="o">=</span> <span class="s">100</span>
+
+<span class="c"># Bind the source and sink to the channel</span>
+<span class="na">a1.sources.r1.channels</span> <span class="o">=</span> <span class="s">c1</span>
+<span class="na">a1.sinks.k1.channel</span> <span class="o">=</span> <span class="s">c1</span>
+</pre></div>
+</div>
+<p>This configuration defines a single agent named a1. a1 has a source that listens for data on port 44444, a channel
+that buffers event data in memory, and a sink that logs event data to the console. The configuration file names the
+various components, then describes their types and configuration parameters. A given configuration file might define
+several named agents; when a given Flume process is launched a flag is passed telling it which named agent to manifest.</p>
+<p>Given this configuration file, we can start Flume as follows:</p>
+<div class="highlight-none"><div class="highlight"><pre>$ bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console
+</pre></div>
+</div>
+<p>Note that in a full deployment we would typically include one more option: <tt class="docutils literal"><span class="pre">--conf=&lt;conf-dir&gt;</span></tt>.
+The <tt class="docutils literal"><span class="pre">&lt;conf-dir&gt;</span></tt> directory would include a shell script <em>flume-env.sh</em> and potentially a log4j properties file.
+In this example, we pass a Java option to force Flume to log to the console and we go without a custom environment script.</p>
+<p>From a separate terminal, we can then telnet port 44444 and send Flume an event:</p>
+<div class="highlight-properties"><pre>$ telnet localhost 44444
+Trying 127.0.0.1...
+Connected to localhost.localdomain (127.0.0.1).
+Escape character is '^]'.
+Hello world! &lt;ENTER&gt;
+OK</pre>
+</div>
+<p>The original Flume terminal will output the event in a log message.</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="na">12/06/19 15</span><span class="o">:</span><span class="s">32:19 INFO source.NetcatSource: Source starting</span>
+<span class="na">12/06/19 15</span><span class="o">:</span><span class="s">32:19 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]</span>
+<span class="na">12/06/19 15</span><span class="o">:</span><span class="s">32:34 INFO sink.LoggerSink: Event: { headers:{} body: 48 65 6C 6C 6F 20 77 6F 72 6C 64 21 0D          Hello world!. }</span>
+</pre></div>
+</div>
+<p>Congratulations - you&#8217;ve successfully configured and deployed a Flume agent! Subsequent sections cover agent configuration in much more detail.</p>
+</div>
+<div class="section" id="using-environment-variables-in-configuration-files">
+<h4>Using environment variables in configuration files<a class="headerlink" href="#using-environment-variables-in-configuration-files" title="Permalink to this headline">¶</a></h4>
+<p>Flume has the ability to substitute environment variables in the configuration. For example:</p>
+<div class="highlight-none"><div class="highlight"><pre>a1.sources = r1
+a1.sources.r1.type = netcat
+a1.sources.r1.bind = 0.0.0.0
+a1.sources.r1.port = ${NC_PORT}
+a1.sources.r1.channels = c1
+</pre></div>
+</div>
+<p>NB: it currently works for values only, not for keys. (Ie. only on the &#8220;right side&#8221; of the <cite>=</cite> mark of the config lines.)</p>
+<p>This can be enabled via Java system properties on agent invocation by setting <cite>propertiesImplementation = org.apache.flume.node.EnvVarResolverProperties</cite>.</p>
+<dl class="docutils">
+<dt>For example::</dt>
+<dd>$ NC_PORT=44444 bin/flume-ng agent &#8211;conf conf &#8211;conf-file example.conf &#8211;name a1 -Dflume.root.logger=INFO,console -DpropertiesImplementation=org.apache.flume.node.EnvVarResolverProperties</dd>
+</dl>
+<p>Note the above is just an example, environment variables can be configured in other ways, including being set in <cite>conf/flume-env.sh</cite>.</p>
+</div>
+<div class="section" id="logging-raw-data">
+<h4>Logging raw data<a class="headerlink" href="#logging-raw-data" title="Permalink to this headline">¶</a></h4>
+<p>Logging the raw stream of data flowing through the ingest pipeline is not desired behaviour in
+many production environments because this may result in leaking sensitive data or security related
+configurations, such as secret keys, to Flume log files.
+By default, Flume will not log such information. On the other hand, if the data pipeline is broken,
+Flume will attempt to provide clues for debugging the problem.</p>
+<p>One way to debug problems with event pipelines is to set up an additional <a class="reference internal" href="#memory-channel">Memory Channel</a>
+connected to a <a class="reference internal" href="#logger-sink">Logger Sink</a>, which will output all event data to the Flume logs.
+In some situations, however, this approach is insufficient.</p>
+<p>In order to enable logging of event- and configuration-related data, some Java system properties
+must be set in addition to log4j properties.</p>
+<p>To enable configuration-related logging, set the Java system property
+<tt class="docutils literal"><span class="pre">-Dorg.apache.flume.log.printconfig=true</span></tt>. This can either be passed on the command line or by
+setting this in the <tt class="docutils literal"><span class="pre">JAVA_OPTS</span></tt> variable in <em>flume-env.sh</em>.</p>
+<p>To enable data logging, set the Java system property <tt class="docutils literal"><span class="pre">-Dorg.apache.flume.log.rawdata=true</span></tt>
+in the same way described above. For most components, the log4j logging level must also be set to
+DEBUG or TRACE to make event-specific logging appear in the Flume logs.</p>
+<p>Here is an example of enabling both configuration logging and raw data logging while also
+setting the Log4j loglevel to DEBUG for console output:</p>
+<div class="highlight-none"><div class="highlight"><pre>$ bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=DEBUG,console -Dorg.apache.flume.log.printconfig=true -Dorg.apache.flume.log.rawdata=true
+</pre></div>
+</div>
+</div>
+<div class="section" id="zookeeper-based-configuration">
+<h4>Zookeeper based Configuration<a class="headerlink" href="#zookeeper-based-configuration" title="Permalink to this headline">¶</a></h4>
+<p>Flume supports Agent configurations via Zookeeper. <em>This is an experimental feature.</em> The configuration file needs to be uploaded
+in the Zookeeper, under a configurable prefix. The configuration file is stored in Zookeeper Node data.
+Following is how the Zookeeper Node tree would look like for agents a1 and a2</p>
+<div class="highlight-properties"><pre>- /flume
+ |- /a1 [Agent config file]
+ |- /a2 [Agent config file]</pre>
+</div>
+<p>Once the configuration file is uploaded, start the agent with following options</p>
+<blockquote>
+<div>$ bin/flume-ng agent &#8211;conf conf -z zkhost:2181,zkhost1:2181 -p /flume &#8211;name a1 -Dflume.root.logger=INFO,console</div></blockquote>
+<table border="1" class="docutils">
+<colgroup>
+<col width="17%" />
+<col width="15%" />
+<col width="68%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Argument Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td><strong>z</strong></td>
+<td>&#8211;</td>
+<td>Zookeeper connection string. Comma separated list of hostname:port</td>
+</tr>
+<tr class="row-odd"><td><strong>p</strong></td>
+<td>/flume</td>
+<td>Base Path in Zookeeper to store Agent configurations</td>
+</tr>
+</tbody>
+</table>
+</div>
+<div class="section" id="installing-third-party-plugins">
+<h4>Installing third-party plugins<a class="headerlink" href="#installing-third-party-plugins" title="Permalink to this headline">¶</a></h4>
+<p>Flume has a fully plugin-based architecture. While Flume ships with many
+out-of-the-box sources, channels, sinks, serializers, and the like, many
+implementations exist which ship separately from Flume.</p>
+<p>While it has always been possible to include custom Flume components by
+adding their jars to the FLUME_CLASSPATH variable in the flume-env.sh file,
+Flume now supports a special directory called <tt class="docutils literal"><span class="pre">plugins.d</span></tt> which automatically
+picks up plugins that are packaged in a specific format. This allows for easier
+management of plugin packaging issues as well as simpler debugging and
+troubleshooting of several classes of issues, especially library dependency
+conflicts.</p>
+<div class="section" id="the-plugins-d-directory">
+<h5>The plugins.d directory<a class="headerlink" href="#the-plugins-d-directory" title="Permalink to this headline">¶</a></h5>
+<p>The <tt class="docutils literal"><span class="pre">plugins.d</span></tt> directory is located at <tt class="docutils literal"><span class="pre">$FLUME_HOME/plugins.d</span></tt>. At startup
+time, the <tt class="docutils literal"><span class="pre">flume-ng</span></tt> start script looks in the <tt class="docutils literal"><span class="pre">plugins.d</span></tt> directory for
+plugins that conform to the below format and includes them in proper paths when
+starting up <tt class="docutils literal"><span class="pre">java</span></tt>.</p>
+</div>
+<div class="section" id="directory-layout-for-plugins">
+<h5>Directory layout for plugins<a class="headerlink" href="#directory-layout-for-plugins" title="Permalink to this headline">¶</a></h5>
+<p>Each plugin (subdirectory) within <tt class="docutils literal"><span class="pre">plugins.d</span></tt> can have up to three
+sub-directories:</p>
+<ol class="arabic simple">
+<li>lib - the plugin&#8217;s jar(s)</li>
+<li>libext - the plugin&#8217;s dependency jar(s)</li>
+<li>native - any required native libraries, such as <tt class="docutils literal"><span class="pre">.so</span></tt> files</li>
+</ol>
+<p>Example of two plugins within the plugins.d directory:</p>
+<div class="highlight-none"><div class="highlight"><pre>plugins.d/
+plugins.d/custom-source-1/
+plugins.d/custom-source-1/lib/my-source.jar
+plugins.d/custom-source-1/libext/spring-core-2.5.6.jar
+plugins.d/custom-source-2/
+plugins.d/custom-source-2/lib/custom.jar
+plugins.d/custom-source-2/native/gettext.so
+</pre></div>
+</div>
+</div>
+</div>
+</div>
+<div class="section" id="data-ingestion">
+<h3>Data ingestion<a class="headerlink" href="#data-ingestion" title="Permalink to this headline">¶</a></h3>
+<p>Flume supports a number of mechanisms to ingest data from external sources.</p>
+<div class="section" id="rpc">
+<h4>RPC<a class="headerlink" href="#rpc" title="Permalink to this headline">¶</a></h4>
+<p>An Avro client included in the Flume distribution can send a given file to
+Flume Avro source using avro RPC mechanism:</p>
+<div class="highlight-none"><div class="highlight"><pre>$ bin/flume-ng avro-client -H localhost -p 41414 -F /usr/logs/log.10
+</pre></div>
+</div>
+<p>The above command will send the contents of /usr/logs/log.10 to to the Flume
+source listening on that ports.</p>
+</div>
+<div class="section" id="executing-commands">
+<h4>Executing commands<a class="headerlink" href="#executing-commands" title="Permalink to this headline">¶</a></h4>
+<p>There&#8217;s an exec source that executes a given command and consumes the output. A
+single &#8216;line&#8217; of output ie. text followed by carriage return (&#8216;\r&#8217;) or line
+feed (&#8216;\n&#8217;) or both together.</p>
+</div>
+<div class="section" id="network-streams">
+<h4>Network streams<a class="headerlink" href="#network-streams" title="Permalink to this headline">¶</a></h4>
+<p>Flume supports the following mechanisms to read data from popular log stream
+types, such as:</p>
+<ol class="arabic simple">
+<li>Avro</li>
+<li>Thrift</li>
+<li>Syslog</li>
+<li>Netcat</li>
+</ol>
+</div>
+</div>
+<div class="section" id="setting-multi-agent-flow">
+<h3>Setting multi-agent flow<a class="headerlink" href="#setting-multi-agent-flow" title="Permalink to this headline">¶</a></h3>
+<div class="figure align-center">
+<img alt="Two agents communicating over Avro RPC" src="_images/UserGuide_image03.png" />
+</div>
+<p>In order to flow the data across multiple agents or hops, the sink of the
+previous agent and source of the current hop need to be avro type with the sink
+pointing to the hostname (or IP address) and port of the source.</p>
+</div>
+<div class="section" id="consolidation">
+<h3>Consolidation<a class="headerlink" href="#consolidation" title="Permalink to this headline">¶</a></h3>
+<p>A very common scenario in log collection is a large number of log producing
+clients sending data to a few consumer agents that are attached to the storage
+subsystem. For example, logs collected from hundreds of web servers sent to a
+dozen of agents that write to HDFS cluster.</p>
+<div class="figure align-center">
+<img alt="A fan-in flow using Avro RPC to consolidate events in one place" src="_images/UserGuide_image02.png" />
+</div>
+<p>This can be achieved in Flume by configuring a number of first tier agents with
+an avro sink, all pointing to an avro source of single agent (Again you could
+use the thrift sources/sinks/clients in such a scenario). This source
+on the second tier agent consolidates the received events into a single
+channel which is consumed by a sink to its final destination.</p>
+</div>
+<div class="section" id="multiplexing-the-flow">
+<h3>Multiplexing the flow<a class="headerlink" href="#multiplexing-the-flow" title="Permalink to this headline">¶</a></h3>
+<p>Flume supports multiplexing the event flow to one or more destinations. This is
+achieved by defining a flow multiplexer that can replicate or selectively route
+an event to one or more channels.</p>
+<div class="figure align-center">
+<img alt="A fan-out flow using a (multiplexing) channel selector" src="_images/UserGuide_image01.png" />
+</div>
+<p>The above example shows a source from agent &#8220;foo&#8221; fanning out the flow to three
+different channels. This fan out can be replicating or multiplexing. In case of
+replicating flow, each event is sent to all three channels. For the
+multiplexing case, an event is delivered to a subset of available channels when
+an event&#8217;s attribute matches a preconfigured value. For example, if an event
+attribute called &#8220;txnType&#8221; is set to &#8220;customer&#8221;, then it should go to channel1
+and channel3, if it&#8217;s &#8220;vendor&#8221; then it should go to channel2, otherwise
+channel3. The mapping can be set in the agent&#8217;s configuration file.</p>
+</div>
+</div>
+<div class="section" id="configuration">
+<h2>Configuration<a class="headerlink" href="#configuration" title="Permalink to this headline">¶</a></h2>
+<p>As mentioned in the earlier section, Flume agent configuration is read from a
+file that resembles a Java property file format with hierarchical property
+settings.</p>
+<div class="section" id="defining-the-flow">
+<h3>Defining the flow<a class="headerlink" href="#defining-the-flow" title="Permalink to this headline">¶</a></h3>
+<p>To define the flow within a single agent, you need to link the sources and
+sinks via a channel. You need to list the sources, sinks and channels for the
+given agent, and then point the source and sink to a channel. A source instance
+can specify multiple channels, but a sink instance can only specify one channel.
+The format is as follows:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="c"># list the sources, sinks and channels for the agent</span>
+<span class="na">&lt;Agent&gt;.sources</span> <span class="o">=</span> <span class="s">&lt;Source&gt;</span>
+<span class="na">&lt;Agent&gt;.sinks</span> <span class="o">=</span> <span class="s">&lt;Sink&gt;</span>
+<span class="na">&lt;Agent&gt;.channels</span> <span class="o">=</span> <span class="s">&lt;Channel1&gt; &lt;Channel2&gt;</span>
+
+<span class="c"># set channel for source</span>
+<span class="na">&lt;Agent&gt;.sources.&lt;Source&gt;.channels</span> <span class="o">=</span> <span class="s">&lt;Channel1&gt; &lt;Channel2&gt; ...</span>
+
+<span class="c"># set channel for sink</span>
+<span class="na">&lt;Agent&gt;.sinks.&lt;Sink&gt;.channel</span> <span class="o">=</span> <span class="s">&lt;Channel1&gt;</span>
+</pre></div>
+</div>
+<p>For example, an agent named agent_foo is reading data from an external avro client and sending
+it to HDFS via a memory channel. The config file weblog.config could look like:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="c"># list the sources, sinks and channels for the agent</span>
+<span class="na">agent_foo.sources</span> <span class="o">=</span> <span class="s">avro-appserver-src-1</span>
+<span class="na">agent_foo.sinks</span> <span class="o">=</span> <span class="s">hdfs-sink-1</span>
+<span class="na">agent_foo.channels</span> <span class="o">=</span> <span class="s">mem-channel-1</span>
+
+<span class="c"># set channel for source</span>
+<span class="na">agent_foo.sources.avro-appserver-src-1.channels</span> <span class="o">=</span> <span class="s">mem-channel-1</span>
+
+<span class="c"># set channel for sink</span>
+<span class="na">agent_foo.sinks.hdfs-sink-1.channel</span> <span class="o">=</span> <span class="s">mem-channel-1</span>
+</pre></div>
+</div>
+<p>This will make the events flow from avro-AppSrv-source to hdfs-Cluster1-sink
+through the memory channel mem-channel-1. When the agent is started with the
+weblog.config as its config file, it will instantiate that flow.</p>
+</div>
+<div class="section" id="id1">
+<h3>Configuring individual components<a class="headerlink" href="#id1" title="Permalink to this headline">¶</a></h3>
+<p>After defining the flow, you need to set properties of each source, sink and
+channel. This is done in the same hierarchical namespace fashion where you set
+the component type and other values for the properties specific to each
+component:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="c"># properties for sources</span>
+<span class="na">&lt;Agent&gt;.sources.&lt;Source&gt;.&lt;someProperty&gt;</span> <span class="o">=</span> <span class="s">&lt;someValue&gt;</span>
+
+<span class="c"># properties for channels</span>
+<span class="na">&lt;Agent&gt;.channel.&lt;Channel&gt;.&lt;someProperty&gt;</span> <span class="o">=</span> <span class="s">&lt;someValue&gt;</span>
+
+<span class="c"># properties for sinks</span>
+<span class="na">&lt;Agent&gt;.sources.&lt;Sink&gt;.&lt;someProperty&gt;</span> <span class="o">=</span> <span class="s">&lt;someValue&gt;</span>
+</pre></div>
+</div>
+<p>The property &#8220;type&#8221; needs to be set for each component for Flume to understand
+what kind of object it needs to be. Each source, sink and channel type has its
+own set of properties required for it to function as intended. All those need
+to be set as needed. In the previous example, we have a flow from
+avro-AppSrv-source to hdfs-Cluster1-sink through the memory channel
+mem-channel-1. Here&#8217;s an example that shows configuration of each of those
+components:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="na">agent_foo.sources</span> <span class="o">=</span> <span class="s">avro-AppSrv-source</span>
+<span class="na">agent_foo.sinks</span> <span class="o">=</span> <span class="s">hdfs-Cluster1-sink</span>
+<span class="na">agent_foo.channels</span> <span class="o">=</span> <span class="s">mem-channel-1</span>
+
+<span class="c"># set channel for sources, sinks</span>
+
+<span class="c"># properties of avro-AppSrv-source</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source.type</span> <span class="o">=</span> <span class="s">avro</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source.bind</span> <span class="o">=</span> <span class="s">localhost</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source.port</span> <span class="o">=</span> <span class="s">10000</span>
+
+<span class="c"># properties of mem-channel-1</span>
+<span class="na">agent_foo.channels.mem-channel-1.type</span> <span class="o">=</span> <span class="s">memory</span>
+<span class="na">agent_foo.channels.mem-channel-1.capacity</span> <span class="o">=</span> <span class="s">1000</span>
+<span class="na">agent_foo.channels.mem-channel-1.transactionCapacity</span> <span class="o">=</span> <span class="s">100</span>
+
+<span class="c"># properties of hdfs-Cluster1-sink</span>
+<span class="na">agent_foo.sinks.hdfs-Cluster1-sink.type</span> <span class="o">=</span> <span class="s">hdfs</span>
+<span class="na">agent_foo.sinks.hdfs-Cluster1-sink.hdfs.path</span> <span class="o">=</span> <span class="s">hdfs://namenode/flume/webdata</span>
+
+<span class="c">#...</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="adding-multiple-flows-in-an-agent">
+<h3>Adding multiple flows in an agent<a class="headerlink" href="#adding-multiple-flows-in-an-agent" title="Permalink to this headline">¶</a></h3>
+<p>A single Flume agent can contain several independent flows. You can list
+multiple sources, sinks and channels in a config. These components can be
+linked to form multiple flows:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="c"># list the sources, sinks and channels for the agent</span>
+<span class="na">&lt;Agent&gt;.sources</span> <span class="o">=</span> <span class="s">&lt;Source1&gt; &lt;Source2&gt;</span>
+<span class="na">&lt;Agent&gt;.sinks</span> <span class="o">=</span> <span class="s">&lt;Sink1&gt; &lt;Sink2&gt;</span>
+<span class="na">&lt;Agent&gt;.channels</span> <span class="o">=</span> <span class="s">&lt;Channel1&gt; &lt;Channel2&gt;</span>
+</pre></div>
+</div>
+<p>Then you can link the sources and sinks to their corresponding channels (for
+sources) of channel (for sinks) to setup two different flows. For example, if
+you need to setup two flows in an agent, one going from an external avro client
+to external HDFS and another from output of a tail to avro sink, then here&#8217;s a
+config to do that:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="c"># list the sources, sinks and channels in the agent</span>
+<span class="na">agent_foo.sources</span> <span class="o">=</span> <span class="s">avro-AppSrv-source1 exec-tail-source2</span>
+<span class="na">agent_foo.sinks</span> <span class="o">=</span> <span class="s">hdfs-Cluster1-sink1 avro-forward-sink2</span>
+<span class="na">agent_foo.channels</span> <span class="o">=</span> <span class="s">mem-channel-1 file-channel-2</span>
+
+<span class="c"># flow #1 configuration</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source1.channels</span> <span class="o">=</span> <span class="s">mem-channel-1</span>
+<span class="na">agent_foo.sinks.hdfs-Cluster1-sink1.channel</span> <span class="o">=</span> <span class="s">mem-channel-1</span>
+
+<span class="c"># flow #2 configuration</span>
+<span class="na">agent_foo.sources.exec-tail-source2.channels</span> <span class="o">=</span> <span class="s">file-channel-2</span>
+<span class="na">agent_foo.sinks.avro-forward-sink2.channel</span> <span class="o">=</span> <span class="s">file-channel-2</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="configuring-a-multi-agent-flow">
+<h3>Configuring a multi agent flow<a class="headerlink" href="#configuring-a-multi-agent-flow" title="Permalink to this headline">¶</a></h3>
+<p>To setup a multi-tier flow, you need to have an avro/thrift sink of first hop
+pointing to avro/thrift source of the next hop. This will result in the first
+Flume agent forwarding events to the next Flume agent. For example, if you are
+periodically sending files (1 file per event) using avro client to a local
+Flume agent, then this local agent can forward it to another agent that has the
+mounted for storage.</p>
+<p>Weblog agent config:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="c"># list sources, sinks and channels in the agent</span>
+<span class="na">agent_foo.sources</span> <span class="o">=</span> <span class="s">avro-AppSrv-source</span>
+<span class="na">agent_foo.sinks</span> <span class="o">=</span> <span class="s">avro-forward-sink</span>
+<span class="na">agent_foo.channels</span> <span class="o">=</span> <span class="s">file-channel</span>
+
+<span class="c"># define the flow</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source.channels</span> <span class="o">=</span> <span class="s">file-channel</span>
+<span class="na">agent_foo.sinks.avro-forward-sink.channel</span> <span class="o">=</span> <span class="s">file-channel</span>
+
+<span class="c"># avro sink properties</span>
+<span class="na">agent_foo.sinks.avro-forward-sink.type</span> <span class="o">=</span> <span class="s">avro</span>
+<span class="na">agent_foo.sinks.avro-forward-sink.hostname</span> <span class="o">=</span> <span class="s">10.1.1.100</span>
+<span class="na">agent_foo.sinks.avro-forward-sink.port</span> <span class="o">=</span> <span class="s">10000</span>
+
+<span class="c"># configure other pieces</span>
+<span class="c">#...</span>
+</pre></div>
+</div>
+<p>HDFS agent config:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="c"># list sources, sinks and channels in the agent</span>
+<span class="na">agent_foo.sources</span> <span class="o">=</span> <span class="s">avro-collection-source</span>
+<span class="na">agent_foo.sinks</span> <span class="o">=</span> <span class="s">hdfs-sink</span>
+<span class="na">agent_foo.channels</span> <span class="o">=</span> <span class="s">mem-channel</span>
+
+<span class="c"># define the flow</span>
+<span class="na">agent_foo.sources.avro-collection-source.channels</span> <span class="o">=</span> <span class="s">mem-channel</span>
+<span class="na">agent_foo.sinks.hdfs-sink.channel</span> <span class="o">=</span> <span class="s">mem-channel</span>
+
+<span class="c"># avro source properties</span>
+<span class="na">agent_foo.sources.avro-collection-source.type</span> <span class="o">=</span> <span class="s">avro</span>
+<span class="na">agent_foo.sources.avro-collection-source.bind</span> <span class="o">=</span> <span class="s">10.1.1.100</span>
+<span class="na">agent_foo.sources.avro-collection-source.port</span> <span class="o">=</span> <span class="s">10000</span>
+
+<span class="c"># configure other pieces</span>
+<span class="c">#...</span>
+</pre></div>
+</div>
+<p>Here we link the avro-forward-sink from the weblog agent to the
+avro-collection-source of the hdfs agent. This will result in the events coming
+from the external appserver source eventually getting stored in HDFS.</p>
+</div>
+<div class="section" id="fan-out-flow">
+<h3>Fan out flow<a class="headerlink" href="#fan-out-flow" title="Permalink to this headline">¶</a></h3>
+<p>As discussed in previous section, Flume supports fanning out the flow from one
+source to multiple channels. There are two modes of fan out, replicating and
+multiplexing. In the replicating flow, the event is sent to all the configured
+channels. In case of multiplexing, the event is sent to only a subset of
+qualifying channels. To fan out the flow, one needs to specify a list of
+channels for a source and the policy for the fanning it out. This is done by
+adding a channel &#8220;selector&#8221; that can be replicating or multiplexing. Then
+further specify the selection rules if it&#8217;s a multiplexer. If you don&#8217;t specify
+a selector, then by default it&#8217;s replicating:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="c"># List the sources, sinks and channels for the agent</span>
+<span class="na">&lt;Agent&gt;.sources</span> <span class="o">=</span> <span class="s">&lt;Source1&gt;</span>
+<span class="na">&lt;Agent&gt;.sinks</span> <span class="o">=</span> <span class="s">&lt;Sink1&gt; &lt;Sink2&gt;</span>
+<span class="na">&lt;Agent&gt;.channels</span> <span class="o">=</span> <span class="s">&lt;Channel1&gt; &lt;Channel2&gt;</span>
+
+<span class="c"># set list of channels for source (separated by space)</span>
+<span class="na">&lt;Agent&gt;.sources.&lt;Source1&gt;.channels</span> <span class="o">=</span> <span class="s">&lt;Channel1&gt; &lt;Channel2&gt;</span>
+
+<span class="c"># set channel for sinks</span>
+<span class="na">&lt;Agent&gt;.sinks.&lt;Sink1&gt;.channel</span> <span class="o">=</span> <span class="s">&lt;Channel1&gt;</span>
+<span class="na">&lt;Agent&gt;.sinks.&lt;Sink2&gt;.channel</span> <span class="o">=</span> <span class="s">&lt;Channel2&gt;</span>
+
+<span class="na">&lt;Agent&gt;.sources.&lt;Source1&gt;.selector.type</span> <span class="o">=</span> <span class="s">replicating</span>
+</pre></div>
+</div>
+<p>The multiplexing select has a further set of properties to bifurcate the flow.
+This requires specifying a mapping of an event attribute to a set for channel.
+The selector checks for each configured attribute in the event header. If it
+matches the specified value, then that event is sent to all the channels mapped
+to that value. If there&#8217;s no match, then the event is sent to set of channels
+configured as default:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="c"># Mapping for multiplexing selector</span>
+<span class="na">&lt;Agent&gt;.sources.&lt;Source1&gt;.selector.type</span> <span class="o">=</span> <span class="s">multiplexing</span>
+<span class="na">&lt;Agent&gt;.sources.&lt;Source1&gt;.selector.header</span> <span class="o">=</span> <span class="s">&lt;someHeader&gt;</span>
+<span class="na">&lt;Agent&gt;.sources.&lt;Source1&gt;.selector.mapping.&lt;Value1&gt;</span> <span class="o">=</span> <span class="s">&lt;Channel1&gt;</span>
+<span class="na">&lt;Agent&gt;.sources.&lt;Source1&gt;.selector.mapping.&lt;Value2&gt;</span> <span class="o">=</span> <span class="s">&lt;Channel1&gt; &lt;Channel2&gt;</span>
+<span class="na">&lt;Agent&gt;.sources.&lt;Source1&gt;.selector.mapping.&lt;Value3&gt;</span> <span class="o">=</span> <span class="s">&lt;Channel2&gt;</span>
+<span class="c">#...</span>
+
+<span class="na">&lt;Agent&gt;.sources.&lt;Source1&gt;.selector.default</span> <span class="o">=</span> <span class="s">&lt;Channel2&gt;</span>
+</pre></div>
+</div>
+<p>The mapping allows overlapping the channels for each value.</p>
+<p>The following example has a single flow that multiplexed to two paths. The
+agent named agent_foo has a single avro source and two channels linked to two sinks:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="c"># list the sources, sinks and channels in the agent</span>
+<span class="na">agent_foo.sources</span> <span class="o">=</span> <span class="s">avro-AppSrv-source1</span>
+<span class="na">agent_foo.sinks</span> <span class="o">=</span> <span class="s">hdfs-Cluster1-sink1 avro-forward-sink2</span>
+<span class="na">agent_foo.channels</span> <span class="o">=</span> <span class="s">mem-channel-1 file-channel-2</span>
+
+<span class="c"># set channels for source</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source1.channels</span> <span class="o">=</span> <span class="s">mem-channel-1 file-channel-2</span>
+
+<span class="c"># set channel for sinks</span>
+<span class="na">agent_foo.sinks.hdfs-Cluster1-sink1.channel</span> <span class="o">=</span> <span class="s">mem-channel-1</span>
+<span class="na">agent_foo.sinks.avro-forward-sink2.channel</span> <span class="o">=</span> <span class="s">file-channel-2</span>
+
+<span class="c"># channel selector configuration</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source1.selector.type</span> <span class="o">=</span> <span class="s">multiplexing</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source1.selector.header</span> <span class="o">=</span> <span class="s">State</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source1.selector.mapping.CA</span> <span class="o">=</span> <span class="s">mem-channel-1</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ</span> <span class="o">=</span> <span class="s">file-channel-2</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source1.selector.mapping.NY</span> <span class="o">=</span> <span class="s">mem-channel-1 file-channel-2</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source1.selector.default</span> <span class="o">=</span> <span class="s">mem-channel-1</span>
+</pre></div>
+</div>
+<p>The selector checks for a header called &#8220;State&#8221;. If the value is &#8220;CA&#8221; then its
+sent to mem-channel-1, if its &#8220;AZ&#8221; then it goes to file-channel-2 or if its
+&#8220;NY&#8221; then both. If the &#8220;State&#8221; header is not set or doesn&#8217;t match any of the
+three, then it goes to mem-channel-1 which is designated as &#8216;default&#8217;.</p>
+<p>The selector also supports optional channels. To specify optional channels for
+a header, the config parameter &#8216;optional&#8217; is used in the following way:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="c"># channel selector configuration</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source1.selector.type</span> <span class="o">=</span> <span class="s">multiplexing</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source1.selector.header</span> <span class="o">=</span> <span class="s">State</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source1.selector.mapping.CA</span> <span class="o">=</span> <span class="s">mem-channel-1</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ</span> <span class="o">=</span> <span class="s">file-channel-2</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source1.selector.mapping.NY</span> <span class="o">=</span> <span class="s">mem-channel-1 file-channel-2</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source1.selector.optional.CA</span> <span class="o">=</span> <span class="s">mem-channel-1 file-channel-2</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ</span> <span class="o">=</span> <span class="s">file-channel-2</span>
+<span class="na">agent_foo.sources.avro-AppSrv-source1.selector.default</span> <span class="o">=</span> <span class="s">mem-channel-1</span>
+</pre></div>
+</div>
+<p>The selector will attempt to write to the required channels first and will fail
+the transaction if even one of these channels fails to consume the events. The
+transaction is reattempted on <strong>all</strong> of the channels. Once all required
+channels have consumed the events, then the selector will attempt to write to
+the optional channels. A failure by any of the optional channels to consume the
+event is simply ignored and not retried.</p>
+<p>If there is an overlap between the optional channels and required channels for a
+specific header, the channel is considered to be required, and a failure in the
+channel will cause the entire set of required channels to be retried. For
+instance, in the above example, for the header &#8220;CA&#8221; mem-channel-1 is considered
+to be a required channel even though it is marked both as required and optional,
+and a failure to write to this channel will cause that
+event to be retried on <strong>all</strong> channels configured for the selector.</p>
+<p>Note that if a header does not have any required channels, then the event will
+be written to the default channels and will be attempted to be written to the
+optional channels for that header. Specifying optional channels will still cause
+the event to be written to the default channels, if no required channels are
+specified. If no channels are designated as default and there are no required,
+the selector will attempt to write the events to the optional channels. Any
+failures are simply ignored in that case.</p>
+</div>
+<div class="section" id="flume-sources">
+<h3>Flume Sources<a class="headerlink" href="#flume-sources" title="Permalink to this headline">¶</a></h3>
+<div class="section" id="avro-source">
+<h4>Avro Source<a class="headerlink" href="#avro-source" title="Permalink to this headline">¶</a></h4>
+<p>Listens on Avro port and receives events from external Avro client streams.
+When paired with the built-in Avro Sink on another (previous hop) Flume agent,
+it can create tiered collection topologies.
+Required properties are in <strong>bold</strong>.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="11%" />
+<col width="10%" />
+<col width="78%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Property Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td><strong>channels</strong></td>
+<td>&#8211;</td>
+<td>&nbsp;</td>
+</tr>
+<tr class="row-odd"><td><strong>type</strong></td>
+<td>&#8211;</td>
+<td>The component type name, needs to be <tt class="docutils literal"><span class="pre">avro</span></tt></td>
+</tr>
+<tr class="row-even"><td><strong>bind</strong></td>
+<td>&#8211;</td>
+<td>hostname or IP address to listen on</td>
+</tr>
+<tr class="row-odd"><td><strong>port</strong></td>
+<td>&#8211;</td>
+<td>Port # to bind to</td>
+</tr>
+<tr class="row-even"><td>threads</td>
+<td>&#8211;</td>
+<td>Maximum number of worker threads to spawn</td>
+</tr>
+<tr class="row-odd"><td>selector.type</td>
+<td>&nbsp;</td>
+<td>&nbsp;</td>
+</tr>
+<tr class="row-even"><td>selector.*</td>
+<td>&nbsp;</td>
+<td>&nbsp;</td>
+</tr>
+<tr class="row-odd"><td>interceptors</td>
+<td>&#8211;</td>
+<td>Space-separated list of interceptors</td>
+</tr>
+<tr class="row-even"><td>interceptors.*</td>
+<td>&nbsp;</td>
+<td>&nbsp;</td>
+</tr>
+<tr class="row-odd"><td>compression-type</td>
+<td>none</td>
+<td>This can be &#8220;none&#8221; or &#8220;deflate&#8221;.  The compression-type must match the compression-type of matching AvroSource</td>
+</tr>
+<tr class="row-even"><td>ssl</td>
+<td>false</td>
+<td>Set this to true to enable SSL encryption. You must also specify a &#8220;keystore&#8221; and a &#8220;keystore-password&#8221;.</td>
+</tr>
+<tr class="row-odd"><td>keystore</td>
+<td>&#8211;</td>
+<td>This is the path to a Java keystore file. Required for SSL.</td>
+</tr>
+<tr class="row-even"><td>keystore-password</td>
+<td>&#8211;</td>
+<td>The password for the Java keystore. Required for SSL.</td>
+</tr>
+<tr class="row-odd"><td>keystore-type</td>
+<td>JKS</td>
+<td>The type of the Java keystore. This can be &#8220;JKS&#8221; or &#8220;PKCS12&#8221;.</td>
+</tr>
+<tr class="row-even"><td>exclude-protocols</td>
+<td>SSLv3</td>
+<td>Space-separated list of SSL/TLS protocols to exclude. SSLv3 will always be excluded in addition to the protocols specified.</td>
+</tr>
+<tr class="row-odd"><td>ipFilter</td>
+<td>false</td>
+<td>Set this to true to enable ipFiltering for netty</td>
+</tr>
+<tr class="row-even"><td>ipFilterRules</td>
+<td>&#8211;</td>
+<td>Define N netty ipFilter pattern rules with this config.</td>
+</tr>
+</tbody>
+</table>
+<p>Example for agent named a1:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="na">a1.sources</span> <span class="o">=</span> <span class="s">r1</span>
+<span class="na">a1.channels</span> <span class="o">=</span> <span class="s">c1</span>
+<span class="na">a1.sources.r1.type</span> <span class="o">=</span> <span class="s">avro</span>
+<span class="na">a1.sources.r1.channels</span> <span class="o">=</span> <span class="s">c1</span>
+<span class="na">a1.sources.r1.bind</span> <span class="o">=</span> <span class="s">0.0.0.0</span>
+<span class="na">a1.sources.r1.port</span> <span class="o">=</span> <span class="s">4141</span>
+</pre></div>
+</div>
+<p>Example of ipFilterRules</p>
+<p>ipFilterRules defines N netty ipFilters separated by a comma a pattern rule must be in this format.</p>
+<p>&lt;&#8217;allow&#8217; or deny&gt;:&lt;&#8217;ip&#8217; or &#8216;name&#8217; for computer name&gt;:&lt;pattern&gt;
+or
+allow/deny:ip/name:pattern</p>
+<p>example: ipFilterRules=allow:ip:127.*,allow:name:localhost,deny:ip:*</p>
+<p>Note that the first rule to match will apply as the example below shows from a client on the localhost</p>
+<p>This will Allow the client on localhost be deny clients from any other ip &#8220;allow:name:localhost,deny:ip:<em>&#8221;
+This will deny the client on localhost be allow clients from any other ip &#8220;deny:name:localhost,allow:ip:</em>&#8220;</p>
+</div>
+<div class="section" id="thrift-source">
+<h4>Thrift Source<a class="headerlink" href="#thrift-source" title="Permalink to this headline">¶</a></h4>
+<p>Listens on Thrift port and receives events from external Thrift client streams.
+When paired with the built-in ThriftSink on another (previous hop) Flume agent,
+it can create tiered collection topologies.
+Thrift source can be configured to start in secure mode by enabling kerberos authentication.
+agent-principal and agent-keytab are the properties used by the
+Thrift source to authenticate to the kerberos KDC.
+Required properties are in <strong>bold</strong>.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="5%" />
+<col width="3%" />
+<col width="91%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Property Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td><strong>channels</strong></td>
+<td>&#8211;</td>
+<td>&nbsp;</td>
+</tr>
+<tr class="row-odd"><td><strong>type</strong></td>
+<td>&#8211;</td>
+<td>The component type name, needs to be <tt class="docutils literal"><span class="pre">thrift</span></tt></td>
+</tr>
+<tr class="row-even"><td><strong>bind</strong></td>
+<td>&#8211;</td>
+<td>hostname or IP address to listen on</td>
+</tr>
+<tr class="row-odd"><td><strong>port</strong></td>
+<td>&#8211;</td>
+<td>Port # to bind to</td>
+</tr>
+<tr class="row-even"><td>threads</td>
+<td>&#8211;</td>
+<td>Maximum number of worker threads to spawn</td>
+</tr>
+<tr class="row-odd"><td>selector.type</td>
+<td>&nbsp;</td>
+<td>&nbsp;</td>
+</tr>
+<tr class="row-even"><td>selector.*</td>
+<td>&nbsp;</td>
+<td>&nbsp;</td>
+</tr>
+<tr class="row-odd"><td>interceptors</td>
+<td>&#8211;</td>
+<td>Space separated list of interceptors</td>
+</tr>
+<tr class="row-even"><td>interceptors.*</td>
+<td>&nbsp;</td>
+<td>&nbsp;</td>
+</tr>
+<tr class="row-odd"><td>ssl</td>
+<td>false</td>
+<td>Set this to true to enable SSL encryption. You must also specify a &#8220;keystore&#8221; and a &#8220;keystore-password&#8221;.</td>
+</tr>
+<tr class="row-even"><td>keystore</td>
+<td>&#8211;</td>
+<td>This is the path to a Java keystore file. Required for SSL.</td>
+</tr>
+<tr class="row-odd"><td>keystore-password</td>
+<td>&#8211;</td>
+<td>The password for the Java keystore. Required for SSL.</td>
+</tr>
+<tr class="row-even"><td>keystore-type</td>
+<td>JKS</td>
+<td>The type of the Java keystore. This can be &#8220;JKS&#8221; or &#8220;PKCS12&#8221;.</td>
+</tr>
+<tr class="row-odd"><td>exclude-protocols</td>
+<td>SSLv3</td>
+<td>Space-separated list of SSL/TLS protocols to exclude. SSLv3 will always be excluded in addition to the protocols specified.</td>
+</tr>
+<tr class="row-even"><td>kerberos</td>
+<td>false</td>
+<td>Set to true to enable kerberos authentication. In kerberos mode, agent-principal and agent-keytab  are required for successful authentication. The Thrift source in secure mode, will accept connections only from Thrift clients that have kerberos enabled and are successfully authenticated to the kerberos KDC.</td>
+</tr>
+<tr class="row-odd"><td>agent-principal</td>
+<td>&#8211;</td>
+<td>The kerberos principal used by the Thrift Source to authenticate to the kerberos KDC.</td>
+</tr>
+<tr class="row-even"><td>agent-keytab</td>
+<td>—-</td>
+<td>The keytab location used by the Thrift Source in combination with the agent-principal to authenticate to the kerberos KDC.</td>
+</tr>
+</tbody>
+</table>
+<p>Example for agent named a1:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="na">a1.sources</span> <span class="o">=</span> <span class="s">r1</span>
+<span class="na">a1.channels</span> <span class="o">=</span> <span class="s">c1</span>
+<span class="na">a1.sources.r1.type</span> <span class="o">=</span> <span class="s">thrift</span>
+<span class="na">a1.sources.r1.channels</span> <span class="o">=</span> <span class="s">c1</span>
+<span class="na">a1.sources.r1.bind</span> <span class="o">=</span> <span class="s">0.0.0.0</span>
+<span class="na">a1.sources.r1.port</span> <span class="o">=</span> <span class="s">4141</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="exec-source">
+<h4>Exec Source<a class="headerlink" href="#exec-source" title="Permalink to this headline">¶</a></h4>
+<p>Exec source runs a given Unix command on start-up and expects that process to
+continuously produce data on standard out (stderr is simply discarded, unless
+property logStdErr is set to true). If the process exits for any reason, the source also exits and
+will produce no further data. This means configurations such as <tt class="docutils literal"><span class="pre">cat</span> <span class="pre">[named</span> <span class="pre">pipe]</span></tt>
+or <tt class="docutils literal"><span class="pre">tail</span> <span class="pre">-F</span> <span class="pre">[file]</span></tt> are going to produce the desired results where as <tt class="docutils literal"><span class="pre">date</span></tt>
+will probably not - the former two commands produce streams of data where as the
+latter produces a single event and exits.</p>
+<p>Required properties are in <strong>bold</strong>.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="8%" />
+<col width="6%" />
+<col width="85%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Property Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td><strong>channels</strong></td>
+<td>&#8211;</td>
+<td>&nbsp;</td>
+</tr>
+<tr class="row-odd"><td><strong>type</strong></td>
+<td>&#8211;</td>
+<td>The component type name, needs to be <tt class="docutils literal"><span class="pre">exec</span></tt></td>
+</tr>
+<tr class="row-even"><td><strong>command</strong></td>
+<td>&#8211;</td>
+<td>The command to execute</td>
+</tr>
+<tr class="row-odd"><td>shell</td>
+<td>&#8211;</td>
+<td>A shell invocation used to run the command.  e.g. /bin/sh -c. Required only for commands relying on shell features like wildcards, back ticks, pipes etc.</td>
+</tr>
+<tr class="row-even"><td>restartThrottle</td>
+<td>10000</td>
+<td>Amount of time (in millis) to wait before attempting a restart</td>
+</tr>
+<tr class="row-odd"><td>restart</td>
+<td>false</td>
+<td>Whether the executed cmd should be restarted if it dies</td>
+</tr>
+<tr class="row-even"><td>logStdErr</td>
+<td>false</td>
+<td>Whether the command&#8217;s stderr should be logged</td>
+</tr>
+<tr class="row-odd"><td>batchSize</td>
+<td>20</td>
+<td>The max number of lines to read and send to the channel at a time</td>
+</tr>
+<tr class="row-even"><td>batchTimeout</td>
+<td>3000</td>
+<td>Amount of time (in milliseconds) to wait, if the buffer size was not reached, before data is pushed downstream</td>
+</tr>
+<tr class="row-odd"><td>selector.type</td>
+<td>replicating</td>
+<td>replicating or multiplexing</td>
+</tr>
+<tr class="row-even"><td>selector.*</td>
+<td>&nbsp;</td>
+<td>Depends on the selector.type value</td>
+</tr>
+<tr class="row-odd"><td>interceptors</td>
+<td>&#8211;</td>
+<td>Space-separated list of interceptors</td>
+</tr>
+<tr class="row-even"><td>interceptors.*</td>
+<td>&nbsp;</td>
+<td>&nbsp;</td>
+</tr>
+</tbody>
+</table>
+<div class="admonition warning">
+<p class="first admonition-title">Warning</p>
+<p class="last">The problem with ExecSource and other asynchronous sources is that the
+source can not guarantee that if there is a failure to put the event
+into the Channel the client knows about it. In such cases, the data will
+be lost. As a for instance, one of the most commonly requested features
+is the <tt class="docutils literal"><span class="pre">tail</span> <span class="pre">-F</span> <span class="pre">[file]</span></tt>-like use case where an application writes
+to a log file on disk and Flume tails the file, sending each line as an
+event. While this is possible, there&#8217;s an obvious problem; what happens
+if the channel fills up and Flume can&#8217;t send an event? Flume has no way
+of indicating to the application writing the log file that it needs to
+retain the log or that the event hasn&#8217;t been sent, for some reason. If
+this doesn&#8217;t make sense, you need only know this: Your application can
+never guarantee data has been received when using a unidirectional
+asynchronous interface such as ExecSource! As an extension of this
+warning - and to be completely clear - there is absolutely zero guarantee
+of event delivery when using this source. For stronger reliability
+guarantees, consider the Spooling Directory Source, Taildir Source or direct integration
+with Flume via the SDK.</p>
+</div>
+<p>Example for agent named a1:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="na">a1.sources</span> <span class="o">=</span> <span class="s">r1</span>
+<span class="na">a1.channels</span> <span class="o">=</span> <span class="s">c1</span>
+<span class="na">a1.sources.r1.type</span> <span class="o">=</span> <span class="s">exec</span>
+<span class="na">a1.sources.r1.command</span> <span class="o">=</span> <span class="s">tail -F /var/log/secure</span>
+<span class="na">a1.sources.r1.channels</span> <span class="o">=</span> <span class="s">c1</span>
+</pre></div>
+</div>
+<p>The &#8216;shell&#8217; config is used to invoke the &#8216;command&#8217; through a command shell (such as Bash
+or Powershell). The &#8216;command&#8217; is passed as an argument to &#8216;shell&#8217; for execution. This
+allows the &#8216;command&#8217; to use features from the shell such as wildcards, back ticks, pipes,
+loops, conditionals etc. In the absence of the &#8216;shell&#8217; config, the &#8216;command&#8217; will be
+invoked directly.  Common values for &#8216;shell&#8217; :  &#8216;/bin/sh -c&#8217;, &#8216;/bin/ksh -c&#8217;,
+&#8216;cmd /c&#8217;,  &#8216;powershell -Command&#8217;, etc.</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="na">a1.sources.tailsource-1.type</span> <span class="o">=</span> <span class="s">exec</span>
+<span class="na">a1.sources.tailsource-1.shell</span> <span class="o">=</span> <span class="s">/bin/bash -c</span>
+<span class="na">a1.sources.tailsource-1.command</span> <span class="o">=</span> <span class="s">for i in /path/*.txt; do cat $i; done</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="jms-source">
+<h4>JMS Source<a class="headerlink" href="#jms-source" title="Permalink to this headline">¶</a></h4>
+<p>JMS Source reads messages from a JMS destination such as a queue or topic. Being a JMS
+application it should work with any JMS provider but has only been tested with ActiveMQ.
+The JMS source provides configurable batch size, message selector, user/pass, and message
+to flume event converter. Note that the vendor provided JMS jars should be included in the
+Flume classpath using plugins.d directory (preferred), &#8211;classpath on command line, or
+via FLUME_CLASSPATH variable in flume-env.sh.</p>
+<p>Required properties are in <strong>bold</strong>.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="20%" />
+<col width="9%" />
+<col width="71%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Property Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td><strong>channels</strong></td>
+<td>&#8211;</td>
+<td>&nbsp;</td>
+</tr>
+<tr class="row-odd"><td><strong>type</strong></td>
+<td>&#8211;</td>
+<td>The component type name, needs to be <tt class="docutils literal"><span class="pre">jms</span></tt></td>
+</tr>
+<tr class="row-even"><td><strong>initialContextFactory</strong></td>
+<td>&#8211;</td>
+<td>Inital Context Factory, e.g: org.apache.activemq.jndi.ActiveMQInitialContextFactory</td>
+</tr>
+<tr class="row-odd"><td><strong>connectionFactory</strong></td>
+<td>&#8211;</td>
+<td>The JNDI name the connection factory should appear as</td>
+</tr>
+<tr class="row-even"><td><strong>providerURL</strong></td>
+<td>&#8211;</td>
+<td>The JMS provider URL</td>
+</tr>
+<tr class="row-odd"><td><strong>destinationName</strong></td>
+<td>&#8211;</td>
+<td>Destination name</td>
+</tr>
+<tr class="row-even"><td><strong>destinationType</strong></td>
+<td>&#8211;</td>
+<td>Destination type (queue or topic)</td>
+</tr>
+<tr class="row-odd"><td>messageSelector</td>
+<td>&#8211;</td>
+<td>Message selector to use when creating the consumer</td>
+</tr>
+<tr class="row-even"><td>userName</td>
+<td>&#8211;</td>
+<td>Username for the destination/provider</td>
+</tr>
+<tr class="row-odd"><td>passwordFile</td>
+<td>&#8211;</td>
+<td>File containing the password for the destination/provider</td>
+</tr>
+<tr class="row-even"><td>batchSize</td>
+<td>100</td>
+<td>Number of messages to consume in one batch</td>
+</tr>
+<tr class="row-odd"><td>converter.type</td>
+<td>DEFAULT</td>
+<td>Class to use to convert messages to flume events. See below.</td>
+</tr>
+<tr class="row-even"><td>converter.*</td>
+<td>&#8211;</td>
+<td>Converter properties.</td>
+</tr>
+<tr class="row-odd"><td>converter.charset</td>
+<td>UTF-8</td>
+<td>Default converter only. Charset to use when converting JMS TextMessages to byte arrays.</td>
+</tr>
+<tr class="row-even"><td>createDurableSubscription</td>
+<td>false</td>
+<td>Whether to create durable subscription. Durable subscription can only be used with
+destinationType topic. If true, &#8220;clientId&#8221; and &#8220;durableSubscriptionName&#8221;
+have to be specified.</td>
+</tr>
+<tr class="row-odd"><td>clientId</td>
+<td>&#8211;</td>
+<td>JMS client identifier set on Connection right after it is created.
+Required for durable subscriptions.</td>
+</tr>
+<tr class="row-even"><td>durableSubscriptionName</td>
+<td>&#8211;</td>
+<td>Name used to identify the durable subscription. Required for durable subscriptions.</td>
+</tr>
+</tbody>
+</table>
+<div class="section" id="converter">
+<h5>Converter<a class="headerlink" href="#converter" title="Permalink to this headline">¶</a></h5>
+<p>The JMS source allows pluggable converters, though it&#8217;s likely the default converter will work
+for most purposes. The default converter is able to convert Bytes, Text, and Object messages
+to FlumeEvents. In all cases, the properties in the message are added as headers to the
+FlumeEvent.</p>
+<dl class="docutils">
+<dt>BytesMessage:</dt>
+<dd>Bytes of message are copied to body of the FlumeEvent. Cannot convert more than 2GB
+of data per message.</dd>
+<dt>TextMessage:</dt>
+<dd>Text of message is converted to a byte array and copied to the body of the
+FlumeEvent. The default converter uses UTF-8 by default but this is configurable.</dd>
+<dt>ObjectMessage:</dt>
+<dd>Object is written out to a ByteArrayOutputStream wrapped in an ObjectOutputStream and
+the resulting array is copied to the body of the FlumeEvent.</dd>
+</dl>
+<p>Example for agent named a1:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="na">a1.sources</span> <span class="o">=</span> <span class="s">r1</span>
+<span class="na">a1.channels</span> <span class="o">=</span> <span class="s">c1</span>
+<span class="na">a1.sources.r1.type</span> <span class="o">=</span> <span class="s">jms</span>
+<span class="na">a1.sources.r1.channels</span> <span class="o">=</span> <span class="s">c1</span>
+<span class="na">a1.sources.r1.initialContextFactory</span> <span class="o">=</span> <span class="s">org.apache.activemq.jndi.ActiveMQInitialContextFactory</span>
+<span class="na">a1.sources.r1.connectionFactory</span> <span class="o">=</span> <span class="s">GenericConnectionFactory</span>
+<span class="na">a1.sources.r1.providerURL</span> <span class="o">=</span> <span class="s">tcp://mqserver:61616</span>
+<span class="na">a1.sources.r1.destinationName</span> <span class="o">=</span> <span class="s">BUSINESS_DATA</span>
+<span class="na">a1.sources.r1.destinationType</span> <span class="o">=</span> <span class="s">QUEUE</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="spooling-directory-source">
+<h4>Spooling Directory Source<a class="headerlink" href="#spooling-directory-source" title="Permalink to this headline">¶</a></h4>
+<p>This source lets you ingest data by placing files to be ingested into a
+&#8220;spooling&#8221; directory on disk.
+This source will watch the specified directory for new files, and will parse
+events out of new files as they appear.
+The event parsing logic is pluggable.
+After a given file has been fully read
+into the channel, it is renamed to indicate completion (or optionally deleted).</p>
+<p>Unlike the Exec source, this source is reliable and will not miss data, even if
+Flume is restarted or killed. In exchange for this reliability, only immutable,
+uniquely-named files must be dropped into the spooling directory. Flume tries
+to detect these problem conditions and will fail loudly if they are violated:</p>
+<ol class="arabic simple">
+<li>If a file is written to after being placed into the spooling directory,
+Flume will print an error to its log file and stop processing.</li>
+<li>If a file name is reused at a later time, Flume will print an error to its
+log file and stop processing.</li>
+</ol>
+<p>To avoid the above issues, it may be useful to add a unique identifier
+(such as a timestamp) to log file names when they are moved into the spooling
+directory.</p>
+<p>Despite the reliability guarantees of this source, there are still
+cases in which events may be duplicated if certain downstream failures occur.
+This is consistent with the guarantees offered by other Flume components.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="18%" />
+<col width="10%" />
+<col width="72%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Property Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td><strong>channels</strong></td>
+<td>&#8211;</td>
+<td>&nbsp;</td>
+</tr>
+<tr class="row-odd"><td><strong>type</strong></td>
+<td>&#8211;</td>
+<td>The component type name, needs to be <tt class="docutils literal"><span class="pre">spooldir</span></tt>.</td>
+</tr>
+<tr class="row-even"><td><strong>spoolDir</strong></td>
+<td>&#8211;</td>
+<td>The directory from which to read files from.</td>
+</tr>
+<tr class="row-odd"><td>fileSuffix</td>
+<td>.COMPLETED</td>
+<td>Suffix to append to completely ingested files</td>
+</tr>
+<tr class="row-even"><td>deletePolicy</td>
+<td>never</td>
+<td>When to delete completed files: <tt class="docutils literal"><span class="pre">never</span></tt> or <tt class="docutils literal"><span class="pre">immediate</span></tt></td>
+</tr>
+<tr class="row-odd"><td>fileHeader</td>
+<td>false</td>
+<td>Whether to add a header storing the absolute path filename.</td>
+</tr>
+<tr class="row-even"><td>fileHeaderKey</td>
+<td>file</td>
+<td>Header key to use when appending absolute path filename to event header.</td>
+</tr>
+<tr class="row-odd"><td>basenameHeader</td>
+<td>false</td>
+<td>Whether to add a header storing the basename of the file.</td>
+</tr>
+<tr class="row-even"><td>basenameHeaderKey</td>
+<td>basename</td>
+<td>Header Key to use when appending  basename of file to event header.</td>
+</tr>
+<tr class="row-odd"><td>includePattern</td>
+<td>^.*$</td>
+<td>Regular expression specifying which files to include.
+It can used together with <tt class="docutils literal"><span class="pre">ignorePattern</span></tt>.
+If a file matches both <tt class="docutils literal"><span class="pre">ignorePattern</span></tt> and <tt class="docutils literal"><span class="pre">includePattern</span></tt> regex,
+the file is ignored.</td>
+</tr>
+<tr class="row-even"><td>ignorePattern</td>
+<td>^$</td>
+<td>Regular expression specifying which files to ignore (skip).
+It can used together with <tt class="docutils literal"><span class="pre">includePattern</span></tt>.
+If a file matches both <tt class="docutils literal"><span class="pre">ignorePattern</span></tt> and <tt class="docutils literal"><span class="pre">includePattern</span></tt> regex,
+the file is ignored.</td>
+</tr>
+<tr class="row-odd"><td>trackerDir</td>
+<td>.flumespool</td>
+<td>Directory to store metadata related to processing of files.
+If this path is not an absolute path, then it is interpreted as relative to the spoolDir.</td>
+</tr>
+<tr class="row-even"><td>consumeOrder</td>
+<td>oldest</td>
+<td>In which order files in the spooling directory will be consumed <tt class="docutils literal"><span class="pre">oldest</span></tt>,
+<tt class="docutils literal"><span class="pre">youngest</span></tt> and <tt class="docutils literal"><span class="pre">random</span></tt>. In case of <tt class="docutils literal"><span class="pre">oldest</span></tt> and <tt class="docutils literal"><span class="pre">youngest</span></tt>, the last modified
+time of the files will be used to compare the files. In case of a tie, the file
+with smallest lexicographical order will be consumed first. In case of <tt class="docutils literal"><span class="pre">random</span></tt> any
+file will be picked randomly. When using <tt class="docutils literal"><span class="pre">oldest</span></tt> and <tt class="docutils literal"><span class="pre">youngest</span></tt> the whole
+directory will be scanned to pick the oldest/youngest file, which might be slow if there
+are a large number of files, while using <tt class="docutils literal"><span class="pre">random</span></tt> may cause old files to be consumed
+very late if new files keep coming in the spooling directory.</td>
+</tr>
+<tr class="row-odd"><td>pollDelay</td>
+<td>500</td>
+<td>Delay (in milliseconds) used when polling for new files.</td>
+</tr>
+<tr class="row-even"><td>recursiveDirectorySearch</td>
+<td>false</td>
+<td>Whether to monitor sub directories for new files to read.</td>
+</tr>
+<tr class="row-odd"><td>maxBackoff</td>
+<td>4000</td>
+<td>The maximum time (in millis) to wait between consecutive attempts to
+write to the channel(s) if the channel is full. The source will start at
+a low backoff and increase it exponentially each time the channel throws a
+ChannelException, upto the value specified by this parameter.</td>
+</tr>
+<tr class="row-even"><td>batchSize</td>
+<td>100</td>
+<td>Granularity at which to batch transfer to the channel</td>
+</tr>
+<tr class="row-odd"><td>inputCharset</td>
+<td>UTF-8</td>
+<td>Character set used by deserializers that treat the input file as text.</td>
+</tr>
+<tr class="row-even"><td>decodeErrorPolicy</td>
+<td><tt class="docutils literal"><span class="pre">FAIL</span></tt></td>
+<td>What to do when we see a non-decodable character in the input file.
+<tt class="docutils literal"><span class="pre">FAIL</span></tt>: Throw an exception and fail to parse the file.
+<tt class="docutils literal"><span class="pre">REPLACE</span></tt>: Replace the unparseable character with the &#8220;replacement character&#8221; char,
+typically Unicode U+FFFD.
+<tt class="docutils literal"><span class="pre">IGNORE</span></tt>: Drop the unparseable character sequence.</td>
+</tr>
+<tr class="row-odd"><td>deserializer</td>
+<td><tt class="docutils literal"><span class="pre">LINE</span></tt></td>
+<td>Specify the deserializer used to parse the file into events.
+Defaults to parsing each line as an event. The class specified must implement
+<tt class="docutils literal"><span class="pre">EventDeserializer.Builder</span></tt>.</td>
+</tr>
+<tr class="row-even"><td>deserializer.*</td>
+<td>&nbsp;</td>
+<td>Varies per event deserializer.</td>
+</tr>
+<tr class="row-odd"><td>bufferMaxLines</td>
+<td>&#8211;</td>
+<td>(Obselete) This option is now ignored.</td>
+</tr>
+<tr class="row-even"><td>bufferMaxLineLength</td>
+<td>5000</td>
+<td>(Deprecated) Maximum length of a line in the commit buffer. Use deserializer.maxLineLength instead.</td>
+</tr>
+<tr class="row-odd"><td>selector.type</td>
+<td>replicating</td>
+<td>replicating or multiplexing</td>
+</tr>
+<tr class="row-even"><td>selector.*</td>
+<td>&nbsp;</td>
+<td>Depends on the selector.type value</td>
+</tr>
+<tr class="row-odd"><td>interceptors</td>
+<td>&#8211;</td>
+<td>Space-separated list of interceptors</td>
+</tr>
+<tr class="row-even"><td>interceptors.*</td>
+<td>&nbsp;</td>
+<td>&nbsp;</td>
+</tr>
+</tbody>
+</table>
+<p>Example for an agent named agent-1:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="na">a1.channels</span> <span class="o">=</span> <span class="s">ch-1</span>
+<span class="na">a1.sources</span> <span class="o">=</span> <span class="s">src-1</span>
+
+<span class="na">a1.sources.src-1.type</span> <span class="o">=</span> <span class="s">spooldir</span>
+<span class="na">a1.sources.src-1.channels</span> <span class="o">=</span> <span class="s">ch-1</span>
+<span class="na">a1.sources.src-1.spoolDir</span> <span class="o">=</span> <span class="s">/var/log/apache/flumeSpool</span>
+<span class="na">a1.sources.src-1.fileHeader</span> <span class="o">=</span> <span class="s">true</span>
+</pre></div>
+</div>
+<div class="section" id="event-deserializers">
+<h5>Event Deserializers<a class="headerlink" href="#event-deserializers" title="Permalink to this headline">¶</a></h5>
+<p>The following event deserializers ship with Flume.</p>
+<div class="section" id="line">
+<h6>LINE<a class="headerlink" href="#line" title="Permalink to this headline">¶</a></h6>
+<p>This deserializer generates one event per line of text input.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="29%" />
+<col width="14%" />
+<col width="57%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Property Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td>deserializer.maxLineLength</td>
+<td>2048</td>
+<td>Maximum number of characters to include in a single event.
+If a line exceeds this length, it is truncated, and the
+remaining characters on the line will appear in a
+subsequent event.</td>
+</tr>
+<tr class="row-odd"><td>deserializer.outputCharset</td>
+<td>UTF-8</td>
+<td>Charset to use for encoding events put into the channel.</td>
+</tr>
+</tbody>
+</table>
+</div>
+<div class="section" id="avro">
+<h6>AVRO<a class="headerlink" href="#avro" title="Permalink to this headline">¶</a></h6>
+<p>This deserializer is able to read an Avro container file, and it generates
+one event per Avro record in the file.
+Each event is annotated with a header that indicates the schema used.
+The body of the event is the binary Avro record data, not
+including the schema or the rest of the container file elements.</p>
+<p>Note that if the spool directory source must retry putting one of these events
+onto a channel (for example, because the channel is full), then it will reset
+and retry from the most recent Avro container file sync point. To reduce
+potential event duplication in such a failure scenario, write sync markers more
+frequently in your Avro input files.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="26%" />
+<col width="12%" />
+<col width="62%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Property Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td>deserializer.schemaType</td>
+<td>HASH</td>
+<td>How the schema is represented. By default, or when the value <tt class="docutils literal"><span class="pre">HASH</span></tt>
+is specified, the Avro schema is hashed and
+the hash is stored in every event in the event header
+&#8220;flume.avro.schema.hash&#8221;. If <tt class="docutils literal"><span class="pre">LITERAL</span></tt> is specified, the JSON-encoded
+schema itself is stored in every event in the event header
+&#8220;flume.avro.schema.literal&#8221;. Using <tt class="docutils literal"><span class="pre">LITERAL</span></tt> mode is relatively
+inefficient compared to <tt class="docutils literal"><span class="pre">HASH</span></tt> mode.</td>
+</tr>
+</tbody>
+</table>
+</div>
+<div class="section" id="blobdeserializer">
+<h6>BlobDeserializer<a class="headerlink" href="#blobdeserializer" title="Permalink to this headline">¶</a></h6>
+<p>This deserializer reads a Binary Large Object (BLOB) per event, typically one BLOB per file. For example a PDF or JPG file. Note that this approach is not suitable for very large objects because the entire BLOB is buffered in RAM.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="20%" />
+<col width="14%" />
+<col width="67%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Property Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td><strong>deserializer</strong></td>
+<td>&#8211;</td>
+<td>The FQCN of this class: <tt class="docutils literal"><span class="pre">org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder</span></tt></td>
+</tr>
+<tr class="row-odd"><td>deserializer.maxBlobLength</td>
+<td>100000000</td>
+<td>The maximum number of bytes to read and buffer for a given request</td>
+</tr>
+</tbody>
+</table>
+</div>
+</div>
+</div>
+<div class="section" id="taildir-source">
+<h4>Taildir Source<a class="headerlink" href="#taildir-source" title="Permalink to this headline">¶</a></h4>
+<div class="admonition note">
+<p class="first admonition-title">Note</p>
+<p class="last"><strong>This source is provided as a preview feature. It does not work on Windows.</strong></p>
+</div>
+<p>Watch the specified files, and tail them in nearly real-time once detected new lines appended to the each files.
+If the new lines are being written, this source will retry reading them in wait for the completion of the write.</p>
+<p>This source is reliable and will not miss data even when the tailing files rotate.
+It periodically writes the last read position of each files on the given position file in JSON format.
+If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file.</p>
+<p>In other use case, this source can also start tailing from the arbitrary position for each files using the given position file.
+When there is no position file on the specified path, it will start tailing from the first line of each files by default.</p>
+<p>Files will be consumed in order of their modification time. File with the oldest modification time will be consumed first.</p>
+<p>This source does not rename or delete or do any modifications to the file being tailed.
+Currently this source does not support tailing binary files. It reads text files line by line.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="19%" />
+<col width="16%" />
+<col width="65%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Property Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td><strong>channels</strong></td>
+<td>&#8211;</td>
+<td>&nbsp;</td>
+</tr>
+<tr class="row-odd"><td><strong>type</strong></td>
+<td>&#8211;</td>
+<td>The component type name, needs to be <tt class="docutils literal"><span class="pre">TAILDIR</span></tt>.</td>
+</tr>
+<tr class="row-even"><td><strong>filegroups</strong></td>
+<td>&#8211;</td>
+<td>Space-separated list of file groups. Each file group indicates a set of files to be tailed.</td>
+</tr>
+<tr class="row-odd"><td><strong>filegroups.&lt;filegroupName&gt;</strong></td>
+<td>&#8211;</td>
+<td>Absolute path of the file group. Regular expression (and not file system patterns) can be used for filename only.</td>
+</tr>
+<tr class="row-even"><td>positionFile</td>
+<td>~/.flume/taildir_position.json</td>
+<td>File in JSON format to record the inode, the absolute path and the last position of each tailing file.</td>
+</tr>
+<tr class="row-odd"><td>headers.&lt;filegroupName&gt;.&lt;headerKey&gt;</td>
+<td>&#8211;</td>
+<td>Header value which is the set with header key. Multiple headers can be specified for one file group.</td>
+</tr>
+<tr class="row-even"><td>byteOffsetHeader</td>
+<td>false</td>
+<td>Whether to add the byte offset of a tailed line to a header called &#8216;byteoffset&#8217;.</td>
+</tr>
+<tr class="row-odd"><td>skipToEnd</td>
+<td>false</td>
+<td>Whether to skip the position to EOF in the case of files not written on the position file.</td>
+</tr>
+<tr class="row-even"><td>idleTimeout</td>
+<td>120000</td>
+<td>Time (ms) to close inactive files. If the closed file is appended new lines to, this source will automatically re-open it.</td>
+</tr>
+<tr class="row-odd"><td>writePosInterval</td>
+<td>3000</td>
+<td>Interval time (ms) to write the last position of each file on the position file.</td>
+</tr>
+<tr class="row-even"><td>batchSize</td>
+<td>100</td>
+<td>Max number of lines to read and send to the channel at a time. Using the default is usually fine.</td>
+</tr>
+<tr class="row-odd"><td>backoffSleepIncrement</td>
+<td>1000</td>
+<td>The increment for time delay before reattempting to poll for new data, when the last attempt did not find any new data.</td>
+</tr>
+<tr class="row-even"><td>maxBackoffSleep</td>
+<td>5000</td>
+<td>The max time delay between each reattempt to poll for new data, when the last attempt did not find any new data.</td>
+</tr>
+<tr class="row-odd"><td>cachePatternMatching</td>
+<td>true</td>
+<td>Listing directories and applying the filename regex pattern may be time consuming for directories
+containing thousands of files. Caching the list of matching files can improve performance.
+The order in which files are consumed will also be cached.
+Requires that the file system keeps track of modification times with at least a 1-second granularity.</td>
+</tr>
+<tr class="row-even"><td>fileHeader</td>
+<td>false</td>
+<td>Whether to add a header storing the absolute path filename.</td>
+</tr>
+<tr class="row-odd"><td>fileHeaderKey</td>
+<td>file</td>
+<td>Header key to use when appending absolute path filename to event header.</td>
+</tr>
+</tbody>
+</table>
+<p>Example for agent named a1:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span class="na">a1.sources</span> <span class="o">=</span> <span class="s">r1</span>
+<span class="na">a1.channels</span> <span class="o">=</span> <span class="s">c1</span>
+<span class="na">a1.sources.r1.type</span> <span class="o">=</span> <span class="s">TAILDIR</span>
+<span class="na">a1.sources.r1.channels</span> <span class="o">=</span> <span class="s">c1</span>
+<span class="na">a1.sources.r1.positionFile</span> <span class="o">=</span> <span class="s">/var/log/flume/taildir_position.json</span>
+<span class="na">a1.sources.r1.filegroups</span> <span class="o">=</span> <span class="s">f1 f2</span>
+<span class="na">a1.sources.r1.filegroups.f1</span> <span class="o">=</span> <span class="s">/var/log/test1/example.log</span>
+<span class="na">a1.sources.r1.headers.f1.headerKey1</span> <span class="o">=</span> <span class="s">value1</span>
+<span class="na">a1.sources.r1.filegroups.f2</span> <span class="o">=</span> <span class="s">/var/log/test2/.*log.*</span>
+<span class="na">a1.sources.r1.headers.f2.headerKey1</span> <span class="o">=</span> <span class="s">value2</span>
+<span class="na">a1.sources.r1.headers.f2.headerKey2</span> <span class="o">=</span> <span class="s">value2-2</span>
+<span class="na">a1.sources.r1.fileHeader</span> <span class="o">=</span> <span class="s">true</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="twitter-1-firehose-source-experimental">

[... 5941 lines stripped ...]