You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flume.apache.org by mp...@apache.org on 2013/07/02 08:06:19 UTC

svn commit: r1498792 [1/3] - in /flume/site/trunk: ./ content/sphinx/ content/sphinx/releases/

Author: mpercy
Date: Tue Jul  2 06:06:18 2013
New Revision: 1498792

URL: http://svn.apache.org/r1498792
Log:
Update web site for 1.4.0 release.

Added:
    flume/site/trunk/content/sphinx/releases/1.4.0.rst
Modified:
    flume/site/trunk/content/sphinx/FlumeDeveloperGuide.rst
    flume/site/trunk/content/sphinx/FlumeUserGuide.rst
    flume/site/trunk/content/sphinx/documentation.rst
    flume/site/trunk/content/sphinx/download.rst
    flume/site/trunk/content/sphinx/getinvolved.rst
    flume/site/trunk/content/sphinx/index.rst
    flume/site/trunk/content/sphinx/releases/index.rst
    flume/site/trunk/pom.xml

Modified: flume/site/trunk/content/sphinx/FlumeDeveloperGuide.rst
URL: http://svn.apache.org/viewvc/flume/site/trunk/content/sphinx/FlumeDeveloperGuide.rst?rev=1498792&r1=1498791&r2=1498792&view=diff
==============================================================================
--- flume/site/trunk/content/sphinx/FlumeDeveloperGuide.rst (original)
+++ flume/site/trunk/content/sphinx/FlumeDeveloperGuide.rst Tue Jul  2 06:06:18 2013
@@ -14,9 +14,9 @@
    limitations under the License.
 
 
-==========================================
-Flume 1.3.0 Developer Guide
-==========================================
+======================================
+Flume 1.4.0 Developer Guide
+======================================
 
 Introduction
 ============
@@ -160,15 +160,15 @@ by using a convenience implementation su
 ``EventBuilder``\ 's overloaded ``withBody()`` static helper methods.
 
 
-Avro RPC default client
-'''''''''''''''''''''''
+RPC clients - Avro and Thrift
+'''''''''''''''''''''''''''''
 
-As of Flume 1.1.0, Avro is the only supported RPC protocol.  The
-``NettyAvroRpcClient`` implements the ``RpcClient`` interface. The client needs
-to create this object with the host and port of the target Flume agent, and can
-then use the ``RpcClient`` to send data into the agent. The following example
-shows how to use the Flume Client SDK API within a user's data-generating
-application:
+As of Flume 1.4.0, Avro is the default RPC protocol.  The
+``NettyAvroRpcClient`` and ``ThriftRpcClient`` implement the ``RpcClient``
+interface. The client needs to create this object with the host and port of
+the target Flume agent, and canthen use the ``RpcClient`` to send data into
+the agent. The following example shows how to use the Flume Client SDK API
+within a user's data-generating application:
 
 .. code-block:: java
 
@@ -206,6 +206,8 @@ application:
       this.hostname = hostname;
       this.port = port;
       this.client = RpcClientFactory.getDefaultInstance(hostname, port);
+      // Use the following method to create a thrift client (instead of the above line):
+      // this.client = RpcClientFactory.getThriftInstance(hostname, port);
     }
 
     public void sendDataToFlume(String data) {
@@ -220,6 +222,8 @@ application:
         client.close();
         client = null;
         client = RpcClientFactory.getDefaultInstance(hostname, port);
+        // Use the following method to create a thrift client (instead of the above line):
+        // this.client = RpcClientFactory.getThriftInstance(hostname, port);
       }
     }
 
@@ -230,7 +234,8 @@ application:
 
   }
 
-The remote Flume agent needs to have an ``AvroSource`` listening on some port.
+The remote Flume agent needs to have an ``AvroSource`` (or a
+``ThriftSource`` if you are using a Thrift client) listening on some port.
 Below is an example Flume agent configuration that's waiting for a connection
 from MyApp:
 
@@ -244,18 +249,21 @@ from MyApp:
 
   a1.sources.r1.channels = c1
   a1.sources.r1.type = avro
+  # For using a thrift source set the following instead of the above line.
+  # a1.source.r1.type = thrift
   a1.sources.r1.bind = 0.0.0.0
   a1.sources.r1.port = 41414
 
   a1.sinks.k1.channel = c1
   a1.sinks.k1.type = logger
 
-For more flexibility, the default Flume client implementation
-(``NettyAvroRpcClient``) can be configured with these properties:
+For more flexibility, the default Flume client implementations
+(``NettyAvroRpcClient`` and ``ThriftRpcClient``) can be configured with these
+properties:
 
 .. code-block:: properties
 
-  client.type = default
+  client.type = default (for avro) or thrift (for thrift)
 
   hosts = h1                           # default client accepts only 1 host
                                        # (additional hosts will be ignored)
@@ -274,7 +282,8 @@ Failover Client
 
 This class wraps the default Avro RPC client to provide failover handling
 capability to clients. This takes a whitespace-separated list of <host>:<port>
-representing the Flume agents that make-up a failover group. If there’s a
+representing the Flume agents that make-up a failover group. The Failover RPC
+Client currently does not support thrift. If there’s a
 communication error with the currently selected host (i.e. agent) agent,
 then the failover client automatically fails-over to the next host in the list.
 For example:
@@ -306,7 +315,7 @@ For more flexibility, the failover Flume
 
   client.type = default_failover
 
-  hosts = h1 h2 h3                     # at least one is required, but 2 or 
+  hosts = h1 h2 h3                     # at least one is required, but 2 or
                                        # more makes better sense
 
   hosts.h1 = host1.example.org:41414
@@ -324,7 +333,7 @@ For more flexibility, the failover Flume
                                        # once to send the Event, and if it
                                        # fails then there will be no failover
                                        # to a second client, so this value
-                                       # causes the failover client to 
+                                       # causes the failover client to
                                        # degenerate into just a default client.
                                        # It makes sense to set this value to at
                                        # least the number of hosts that you
@@ -339,7 +348,7 @@ For more flexibility, the failover Flume
 LoadBalancing RPC client
 ''''''''''''''''''''''''
 
-The Flume Client SDK also supports an RpcClient which load-balances among 
+The Flume Client SDK also supports an RpcClient which load-balances among
 multiple hosts. This type of client takes a whitespace-separated list of
 <host>:<port> representing the Flume agents that make-up a load-balancing group.
 This client can be configured with a load balancing strategy that either
@@ -347,7 +356,8 @@ randomly selects one of the configured h
 fashion. You can also specify your own custom class that implements the
 ``LoadBalancingRpcClient$HostSelector`` interface so that a custom selection
 order is used. In that case, the FQCN of the custom class needs to be specified
-as the value of the ``host-selector`` property.
+as the value of the ``host-selector`` property. The LoadBalancing RPC Client
+currently does not support thrift.
 
 If ``backoff`` is enabled then the client will temporarily blacklist
 hosts that fail, causing them to be excluded from being selected as a failover
@@ -429,7 +439,73 @@ For more flexibility, the load-balancing
 
   connect-timeout = 20000              # Must be >=1000 (default: 20000)
 
-  request-timeout = 20000              # Must be >=1000 (default: 20000)  
+  request-timeout = 20000              # Must be >=1000 (default: 20000)
+
+Embedded agent
+~~~~~~~~~~~~~~
+
+Flume has an embedded agent api which allows users to embed an agent in their
+application. This agent is meant to be lightweight and as such not all
+sources, sinks, and channels are allowed. Specifically the source used
+is a special embedded source and events should be send to the source
+via the put, putAll methods on the EmbeddedAgent object. Only File Channel
+and Memory Channel are allowed as channels while Avro Sink is the only
+supported sink.
+
+Note: The embedded agent has a dependency on hadoop-core.jar.
+
+Configuration of an Embedded Agent is similar to configuration of a
+full Agent. The following is an exhaustive list of configration options:
+
+Required properties are in **bold**.
+
+====================  ================  ==============================================
+Property Name         Default           Description
+====================  ================  ==============================================
+source.type           embedded          The only available source is the embedded source.
+**channel.type**      --                Either ``memory`` or ``file`` which correspond to MemoryChannel and FileChannel respectively.
+channel.*             --                Configuration options for the channel type requested, see MemoryChannel or FileChannel user guide for an exhaustive list.
+**sinks**             --                List of sink names
+**sink.type**         --                Property name must match a name in the list of sinks. Value must be ``avro``
+sink.*                --                Configuration options for the sink. See AvroSink user guide for an exhaustive list, however note AvroSink requires at least hostname and port.
+**processor.type**    --                Either ``failover`` or ``load_balance`` which correspond to FailoverSinksProcessor and LoadBalancingSinkProcessor respectively.
+processor.*           --                Configuration options for the sink processor selected. See FailoverSinksProcessor and LoadBalancingSinkProcessor user guide for an exhaustive list.
+====================  ================  ==============================================
+
+Below is an example of how to use the agent:
+
+.. code-block:: java
+
+    Map<String, String> properties = new HashMap<String, String>();
+    properties.put("channel.type", "memory");
+    properties.put("channel.capacity", "200");
+    properties.put("sinks", "sink1 sink2");
+    properties.put("sink1.type", "avro");
+    properties.put("sink2.type", "avro");
+    properties.put("sink1.hostname", "collector1.apache.org");
+    properties.put("sink1.port", "5564");
+    properties.put("sink2.hostname", "collector2.apache.org");
+    properties.put("sink2.port",  "5565");
+    properties.put("processor.type", "load_balance");
+
+    EmbeddedAgent agent = new EmbeddedAgent("myagent");
+
+    agent.configure(properties);
+    agent.start();
+
+    List<Event> events = Lists.newArrayList();
+
+    events.add(event);
+    events.add(event);
+    events.add(event);
+    events.add(event);
+
+    agent.putAll(events);
+
+    ...
+
+    agent.stop();
+
 
 Transaction interface
 ~~~~~~~~~~~~~~~~~~~~~
@@ -486,7 +562,7 @@ Sink
 
 The purpose of a ``Sink`` to extract ``Event``\ s from the ``Channel`` and
 forward them to the next Flume Agent in the flow or store them in an external
-repository. A ``Sink`` is associated with one or more ``Channel``\ s, as
+repository. A ``Sink`` is associated with exactly one ``Channel``\ s, as
 configured in the Flume properties file. There’s one ``SinkRunner`` instance
 associated with every configured ``Sink``, and when the Flume framework calls
 ``SinkRunner.start()``, a new thread is created to drive the ``Sink`` (using
@@ -512,7 +588,7 @@ processing its own configuration setting
 
       // Process the myProp value (e.g. validation)
 
-      // Store myProp for later retrieval by process() method 
+      // Store myProp for later retrieval by process() method
       this.myProp = myProp;
     }