You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flume.apache.org by mp...@apache.org on 2013/04/03 23:35:59 UTC

git commit: FLUME-1953. Fix dev guide error that says sink can read from multiple channels.

Updated Branches:
  refs/heads/trunk 44d899643 -> 172355c0a


FLUME-1953. Fix dev guide error that says sink can read from multiple channels.

(Israel Ekpo via Mike Percy)


Project: http://git-wip-us.apache.org/repos/asf/flume/repo
Commit: http://git-wip-us.apache.org/repos/asf/flume/commit/172355c0
Tree: http://git-wip-us.apache.org/repos/asf/flume/tree/172355c0
Diff: http://git-wip-us.apache.org/repos/asf/flume/diff/172355c0

Branch: refs/heads/trunk
Commit: 172355c0aa3cf488375739a106cd248637608cab
Parents: 44d8996
Author: Mike Percy <mp...@apache.org>
Authored: Wed Apr 3 14:35:32 2013 -0700
Committer: Mike Percy <mp...@apache.org>
Committed: Wed Apr 3 14:35:32 2013 -0700

----------------------------------------------------------------------
 flume-ng-doc/sphinx/FlumeDeveloperGuide.rst |    2 +-
 flume-ng-doc/sphinx/FlumeUserGuide.rst      |   89 +++++++++++-----------
 flume-ng-doc/sphinx/index.rst               |    2 +-
 3 files changed, 47 insertions(+), 46 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flume/blob/172355c0/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst
----------------------------------------------------------------------
diff --git a/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst b/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst
index 71afa4e..c6ee8b5 100644
--- a/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst
+++ b/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst
@@ -562,7 +562,7 @@ Sink
 
 The purpose of a ``Sink`` to extract ``Event``\ s from the ``Channel`` and
 forward them to the next Flume Agent in the flow or store them in an external
-repository. A ``Sink`` is associated with one or more ``Channel``\ s, as
+repository. A ``Sink`` is associated with exactly one ``Channel``\ s, as
 configured in the Flume properties file. There’s one ``SinkRunner`` instance
 associated with every configured ``Sink``, and when the Flume framework calls
 ``SinkRunner.start()``, a new thread is created to drive the ``Sink`` (using

http://git-wip-us.apache.org/repos/asf/flume/blob/172355c0/flume-ng-doc/sphinx/FlumeUserGuide.rst
----------------------------------------------------------------------
diff --git a/flume-ng-doc/sphinx/FlumeUserGuide.rst b/flume-ng-doc/sphinx/FlumeUserGuide.rst
index 54c9331..2d7e787 100644
--- a/flume-ng-doc/sphinx/FlumeUserGuide.rst
+++ b/flume-ng-doc/sphinx/FlumeUserGuide.rst
@@ -315,7 +315,7 @@ Consolidation
 
 A very common scenario in log collection is a large number of log producing
 clients sending data to a few consumer agents that are attached to the storage
-subsystem. For examples, logs collected from hundreds of web servers sent to a
+subsystem. For example, logs collected from hundreds of web servers sent to a
 dozen of agents that write to HDFS cluster.
 
 .. figure:: images/UserGuide_image02.png
@@ -361,7 +361,7 @@ Defining the flow
 To define the flow within a single agent, you need to link the sources and
 sinks via a channel. You need to list the sources, sinks and channels for the
 given agent, and then point the source and sink to a channel. A source instance
-can specify multiple channels, but a sink instance can only specify on channel.
+can specify multiple channels, but a sink instance can only specify one channel.
 The format is as follows:
 
 .. code-block:: properties
@@ -377,7 +377,7 @@ The format is as follows:
   # set channel for sink
   <Agent>.sinks.<Sink>.channel = <Channel1>
 
-For example an agent named agent_foo is reading data from an external avro client and sending
+For example, an agent named agent_foo is reading data from an external avro client and sending
 it to HDFS via a memory channel. The config file weblog.config could look like:
 
 .. code-block:: properties
@@ -545,15 +545,15 @@ from the external appserver source eventually getting stored in HDFS.
 Fan out flow
 ------------
 
-As discussed in previous section, Flume support fanning out the flow from one
+As discussed in previous section, Flume supports fanning out the flow from one
 source to multiple channels. There are two modes of fan out, replicating and
-multiplexing. In the replicating flow the event is sent to all the configured
+multiplexing. In the replicating flow, the event is sent to all the configured
 channels. In case of multiplexing, the event is sent to only a subset of
 qualifying channels. To fan out the flow, one needs to specify a list of
 channels for a source and the policy for the fanning it out. This is done by
 adding a channel "selector" that can be replicating or multiplexing. Then
 further specify the selection rules if it's a multiplexer. If you don't specify
-an selector, then by default it's replicating:
+a selector, then by default it's replicating:
 
 .. code-block:: properties
 
@@ -682,7 +682,7 @@ Property Name        Default      Description
 threads              --           Maximum number of worker threads to spawn
 selector.type
 selector.*
-interceptors         --           Space separated list of interceptors
+interceptors         --           Space-separated list of interceptors
 interceptors.*
 compression-type     none         This can be "none" or "deflate".  The compression-type must match the compression-type of matching AvroSource
 ==================   ===========  ===================================================
@@ -757,7 +757,7 @@ logStdErr        false        Whether the command's stderr should be logged
 batchSize        20           The max number of lines to read and send to the channel at a time
 selector.type    replicating  replicating or multiplexing
 selector.*                    Depends on the selector.type value
-interceptors     --           Space separated list of interceptors
+interceptors     --           Space-separated list of interceptors
 interceptors.*
 ===============  ===========  ==============================================================
 
@@ -795,13 +795,14 @@ Example for agent named a1:
   a1.sources.r1.channels = c1
 
 The 'shell' config is used to invoke the 'command' through a command shell (such as Bash
-or Powershell). The 'command' is passed as argument to 'shell' for execution. This
+or Powershell). The 'command' is passed as an argument to 'shell' for execution. This
 allows the 'command' to use features from the shell such as wildcards, back ticks, pipes,
 loops, conditionals etc. In the absence of the 'shell' config, the 'command' will be
 invoked directly.  Common values for 'shell' :  '/bin/sh -c', '/bin/ksh -c',
 'cmd /c',  'powershell -Command', etc.
 
 .. code-block:: properties
+
   agent_foo.sources.tailsource-1.type = exec
   agent_foo.sources.tailsource-1.shell = /bin/bash -c
   agent_foo.sources.tailsource-1.command = for i in /path/*.txt; do cat $i; done
@@ -839,7 +840,7 @@ Converter
 '''''''''''
 The JMS source allows pluggable converters, though it's likely the default converter will work
 for most purposes. The default converter is able to convert Bytes, Text, and Object messages
-to FlumeEvents. In all cases the properties in the message are added as headers to the
+to FlumeEvents. In all cases, the properties in the message are added as headers to the
 FlumeEvent.
 
 BytesMessage:
@@ -919,7 +920,7 @@ bufferMaxLines        --              (Obselete) This option is now ignored.
 bufferMaxLineLength   5000            (Deprecated) Maximum length of a line in the commit buffer. Use deserializer.maxLineLength instead.
 selector.type         replicating     replicating or multiplexing
 selector.*                            Depends on the selector.type value
-interceptors          --              Space separated list of interceptors
+interceptors          --              Space-separated list of interceptors
 interceptors.*
 ====================  ==============  ==========================================================
 
@@ -977,7 +978,7 @@ max-line-length  512          Max line length per event body (in bytes)
 ack-every-event  true         Respond with an "OK" for every event received
 selector.type    replicating  replicating or multiplexing
 selector.*                    Depends on the selector.type value
-interceptors     --           Space separated list of interceptors
+interceptors     --           Space-separated list of interceptors
 interceptors.*
 ===============  ===========  ===========================================
 
@@ -1006,7 +1007,7 @@ Property Name   Default      Description
 **type**        --           The component type name, needs to be ``seq``
 selector.type                replicating or multiplexing
 selector.*      replicating  Depends on the selector.type value
-interceptors    --           Space separated list of interceptors
+interceptors    --           Space-separated list of interceptors
 interceptors.*
 batchSize       1
 ==============  ===========  ========================================
@@ -1044,7 +1045,7 @@ Property Name    Default      Description
 eventSize        2500         Maximum size of a single event line, in bytes
 selector.type                 replicating or multiplexing
 selector.*       replicating  Depends on the selector.type value
-interceptors     --           Space separated list of interceptors
+interceptors     --           Space-separated list of interceptors
 interceptors.*
 ==============   ===========  ==============================================
 
@@ -1086,7 +1087,7 @@ readBufferSize        1024              Size of the internal Mina read buffer. P
 numProcessors         (auto-detected)   Number of processors available on the system for use while processing messages. Default is to auto-detect # of CPUs using the Java Runtime API. Mina will spawn 2 request-processing threads per detected CPU, which is often reasonable.
 selector.type         replicating       replicating, multiplexing, or custom
 selector.*            --                Depends on the ``selector.type`` value
-interceptors          --                Space separated list of interceptors.
+interceptors          --                Space-separated list of interceptors.
 interceptors.*
 ====================  ================  ==============================================
 
@@ -1114,7 +1115,7 @@ Property Name   Default      Description
 **port**        --           Port # to bind to
 selector.type                replicating or multiplexing
 selector.*      replicating  Depends on the selector.type value
-interceptors    --           Space separated list of interceptors
+interceptors    --           Space-separated list of interceptors
 interceptors.*
 ==============  ===========  ==============================================
 
@@ -1136,9 +1137,9 @@ A source which accepts Flume Events by HTTP POST and GET. GET should be used
 for experimentation only. HTTP requests are converted into flume events by
 a pluggable "handler" which must implement the HTTPSourceHandler interface.
 This handler takes a HttpServletRequest and returns a list of
-flume events. All events handler from one Http request are committed to the channel
+flume events. All events handled from one Http request are committed to the channel
 in one transaction, thus allowing for increased efficiency on channels like
-the file channel. If the handler throws an exception this source will
+the file channel. If the handler throws an exception, this source will
 return a HTTP status of 400. If the channel is full, or the source is unable to
 append events to the channel, the source will return a HTTP 503 - Temporarily
 unavailable status.
@@ -1155,7 +1156,7 @@ handler         ``org.apache.flume.source.http.JSONHandler``  The FQCN of the ha
 handler.*       --                                            Config parameters for the handler
 selector.type   replicating                                   replicating or multiplexing
 selector.*                                                    Depends on the selector.type value
-interceptors    --                                            Space separated list of interceptors
+interceptors    --                                            Space-separated list of interceptors
 interceptors.*
 ==================================================================================================================================
 
@@ -1202,7 +1203,7 @@ To set the charset, the request must have content type specified as
 ``application/json; charset=UTF-8`` (replace UTF-8 with UTF-16 or UTF-32 as
 required).
 
-One way to create an event in the format expected by this handler, is to
+One way to create an event in the format expected by this handler is to
 use JSONEvent provided in the Flume SDK and use Google Gson to create the JSON
 string using the Gson#fromJson(Object, Type)
 method. The type token to pass as the 2nd argument of this method
@@ -1246,7 +1247,7 @@ Property Name   Default      Description
 **port**        --           The port # to listen on
 selector.type                replicating or multiplexing
 selector.*      replicating  Depends on the selector.type value
-interceptors    --           Space separated list of interceptors
+interceptors    --           Space-separated list of interceptors
 interceptors.*
 ==============  ===========  ========================================================================================
 
@@ -1273,7 +1274,7 @@ Property Name   Default      Description
 **port**        --           The port # to listen on
 selector.type                replicating or multiplexing
 selector.*      replicating  Depends on the selector.type value
-interceptors    --           Space separated list of interceptors
+interceptors    --           Space-separated list of interceptors
 interceptors.*
 ==============  ===========  ======================================================================================
 
@@ -1302,7 +1303,7 @@ Property Name   Default      Description
 **type**        --           The component type name, needs to be your FQCN
 selector.type                ``replicating`` or ``multiplexing``
 selector.*      replicating  Depends on the selector.type value
-interceptors    --           Space separated list of interceptors
+interceptors    --           Space-separated list of interceptors
 interceptors.*
 ==============  ===========  ==============================================
 
@@ -1728,7 +1729,7 @@ serializer.*      --
 
 Note that this sink takes the Zookeeper Quorum and parent znode information in
 the configuration. Zookeeper Quorum and parent node configuration may be
-specified in the flume configuration file, alternatively these configuration
+specified in the flume configuration file. Alternatively, these configuration
 values are taken from the first hbase-site.xml file in the classpath.
 
 If these are not provided in the configuration, then the sink
@@ -1822,8 +1823,8 @@ Source adds the events and Sink removes it.
 Memory Channel
 ~~~~~~~~~~~~~~
 
-The events are stored in a an in-memory queue with configurable max size. It's
-ideal for flow that needs higher throughput and prepared to lose the staged
+The events are stored in an in-memory queue with configurable max size. It's
+ideal for flows that need higher throughput and are prepared to lose the staged
 data in the event of a agent failures.
 Required properties are in **bold**.
 
@@ -1862,7 +1863,7 @@ JDBC Channel
 
 The events are stored in a persistent storage that's backed by a database.
 The JDBC channel currently supports embedded Derby. This is a durable channel
-that's ideal for the flows where recoverability is important.
+that's ideal for flows where recoverability is important.
 Required properties are in **bold**.
 
 ==========================  ====================================  =================================================
@@ -2012,7 +2013,7 @@ Let's say you have aged key-0 out and new files should be encrypted with key-1:
   a1.channels.c1.encryption.keyProvider.keyStorePasswordFile = /path/to/my.keystore.password
   a1.channels.c1.encryption.keyProvider.keys = key-0 key-1
 
-The same scenerio as above, however key-0 has it's own password:
+The same scenerio as above, however key-0 has its own password:
 
 .. code-block:: properties
 
@@ -2136,7 +2137,7 @@ Property Name  Default  Description
 selector.type  --       The component type name, needs to be your FQCN
 =============  =======  ==============================================
 
-Example for agent named a1 and it's source called r1:
+Example for agent named a1 and its source called r1:
 
 .. code-block:: properties
 
@@ -2157,7 +2158,7 @@ Required properties are in **bold**.
 ===================  ===========  =================================================================================
 Property Name        Default      Description
 ===================  ===========  =================================================================================
-**sinks**            --           Space separated list of sinks that are participating in the group
+**sinks**            --           Space-separated list of sinks that are participating in the group
 **processor.type**   ``default``  The component type name, needs to be ``default``, ``failover`` or ``load_balance``
 ===================  ===========  =================================================================================
 
@@ -2184,14 +2185,14 @@ Failover Sink Processor
 Failover Sink Processor maintains a prioritized list of sinks, guaranteeing
 that so long as one is available events will be processed (delivered).
 
-The fail over mechanism works by relegating failed sinks to a pool where
+The failover mechanism works by relegating failed sinks to a pool where
 they are assigned a cool down period, increasing with sequential failures
-before they are retried. Once a sink successfully sends an event it is
+before they are retried. Once a sink successfully sends an event, it is
 restored to the live pool.
 
 To configure, set a sink groups processor to ``failover`` and set
 priorities for all individual sinks. All specified priorities must
-be unique. Furthermore, upper limit to fail over time can be set
+be unique. Furthermore, upper limit to failover time can be set
 (in milliseconds) using ``maxpenalty`` property.
 
 Required properties are in **bold**.
@@ -2199,7 +2200,7 @@ Required properties are in **bold**.
 =================================  ===========  ===================================================================================
 Property Name                      Default      Description
 =================================  ===========  ===================================================================================
-**sinks**                          --           Space separated list of sinks that are participating in the group
+**sinks**                          --           Space-separated list of sinks that are participating in the group
 **processor.type**                 ``default``  The component type name, needs to be ``failover``
 **processor.priority.<sinkName>**  --             <sinkName> must be one of the sink instances associated with the current sink group
 processor.maxpenalty               30000        (in millis)
@@ -2250,7 +2251,7 @@ Required properties are in **bold**.
 =============================  ===============  ==========================================================================
 Property Name                  Default          Description
 =============================  ===============  ==========================================================================
-**processor.sinks**            --               Space separated list of sinks that are participating in the group
+**processor.sinks**            --               Space-separated list of sinks that are participating in the group
 **processor.type**             ``default``      The component type name, needs to be ``load_balance``
 processor.backoff              false            Should failed sinks be backed off exponentially.
 processor.selector             ``round_robin``  Selection mechanism. Must be either ``round_robin``, ``random``
@@ -2363,7 +2364,7 @@ Note that the interceptor builders are passed to the type config parameter. The
 configurable and can be passed configuration values just like they are passed to any other configurable component.
 In the above example, events are passed to the HostInterceptor first and the events returned by the HostInterceptor
 are then passed along to the TimestampInterceptor. You can specify either the fully qualified class name (FQCN)
-or the alias ``timestamp``. If you have multiple collectors writing to the same HDFS path then you could also use
+or the alias ``timestamp``. If you have multiple collectors writing to the same HDFS path, then you could also use
 the HostInterceptor.
 
 Timestamp Interceptor
@@ -2484,7 +2485,7 @@ serializers.<s1>.\ **name**      --
 serializers.*                    --         Serializer-specific properties
 ================================ ========== =================================================================================================
 
-The serializers are used to map the matches to a header name and a formatted header value, by default you only need to specify
+The serializers are used to map the matches to a header name and a formatted header value; by default, you only need to specify
 the header name and the default ``org.apache.flume.interceptor.RegexExtractorInterceptorPassThroughSerializer`` will be used.
 This serializer simply maps the matches to the specified header name and passes the value through as it was extracted by the regex.
 You can plug custom serializer implementations into the extractor using the fully qualified class name (FQCN) to format the matches
@@ -2594,7 +2595,7 @@ Required properties are in **bold**.
 =============  ===========  ==========================================================================
 Property Name  Default      Description
 =============  ===========  ==========================================================================
-**Hosts**      --           A space separated list of host:port
+**Hosts**      --           A space-separated list of host:port
                             at which Flume (through an AvroSource) is listening for events
 Selector       ROUND_ROBIN  Selection mechanism. Must be either ROUND_ROBIN,
                             RANDOM or custom FQDN to class that inherits from LoadBalancingSelector.
@@ -2669,7 +2670,7 @@ and can be specified in the flume-env.sh:
 Property Name            Default  Description
 =======================  =======  =====================================================================================
 **type**                 --       The component type name, has to be ``ganglia``
-**hosts**                --       Comma separated list of ``hostname:port``
+**hosts**                --       Comma-separated list of ``hostname:port``
 pollInterval             60       Time, in seconds, between consecutive reporting to ganglia server
 isGanglia3               false    Ganglia server version is 3. By default, Flume sends in ganglia 3.1 format
 =======================  =======  =====================================================================================
@@ -2821,12 +2822,12 @@ If you need to ingest textual log data into Hadoop/HDFS then Flume is the
 right fit for your problem, full stop. For other use cases, here are some
 guidelines:
 
-Flume is designed to transport and ingest regularly generated event data over
+Flume is designed to transport and ingest regularly-generated event data over
 relatively stable, potentially complex topologies. The notion of "event data"
 is very broadly defined. To Flume, an event is just a generic blob of bytes.
 There are some limitations on how large an event can be - for instance, it
-cannot be larger than you can store in memory or on disk on a single machine -
-but in practice flume events can be everything from textual log entries to
+cannot be larger than what you can store in memory or on disk on a single machine -
+but in practice, flume events can be everything from textual log entries to
 image files. The key property of an event  is that they are generated in a
 continuous, streaming fashion. If your data is not regularly generated
 (i.e. you are trying to do a single bulk load of data into a Hadoop cluster)
@@ -2905,10 +2906,10 @@ Troubleshooting
 Handling agent failures
 -----------------------
 
-If the Flume agent goes down then the all the flows hosted on that agent are
+If the Flume agent goes down, then the all the flows hosted on that agent are
 aborted. Once the agent is restarted, then flow will resume. The flow using
 file channel or other stable channel will resume processing events where it left
-off. If the agent can't be restarted on the same, then there an option to
+off. If the agent can't be restarted on the same hardware, then there is an option to
 migrate the database to another hardware and setup a new Flume agent that
 can resume processing the events saved in the db. The database HA futures
 can be leveraged to move the Flume agent to another host.

http://git-wip-us.apache.org/repos/asf/flume/blob/172355c0/flume-ng-doc/sphinx/index.rst
----------------------------------------------------------------------
diff --git a/flume-ng-doc/sphinx/index.rst b/flume-ng-doc/sphinx/index.rst
index 1903790..0a8634a 100644
--- a/flume-ng-doc/sphinx/index.rst
+++ b/flume-ng-doc/sphinx/index.rst
@@ -50,7 +50,7 @@ Overview
 - `Flume Wiki <http://cwiki.apache.org/confluence/display/FLUME>`_
 - `Getting Started Guide <http://cwiki.apache.org/confluence/display/FLUME/Getting+Started>`_
 - `Flume Issue Tracking (JIRA) <https://issues.apache.org/jira/browse/FLUME>`_
-- `Flume Source Code (SVN) <http://svn.apache.org/repos/asf/flume/>`_
+- `Flume Source Code (GIT) <https://git-wip-us.apache.org/repos/asf?p=flume.git;a=tree;h=refs/heads/trunk;hb=trunk>`_
 
 Documentation
 -------------