You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Israel Ekpo <is...@aicer.org> on 2013/03/15 04:50:17 UTC

Possible Conflicting Information Regarding Relationship Between Channels and Sinks within Documentation

Hey guys,

I have a quick question that I would like to ask based on what I found
within the user and developer documentation.

This could cause some confusion for first time folks.

*Background:*

>From the documentation, a Flume source accepts event data and sends it into
a channel.

These event data are queued up in the channel.

 A sink takes data from the channel for processing (forwarding to another
agent's source or central repo).

Furthermore, *there can be one source, one or more channels, and one or
more sinks for each agent.  *

Within an agent, *a flume source can write to multiple channels, but a sink
can pull events from only one channel.*

Hence, within this context, the relationship between a source and channel
could be one to many but the relationship between a sink and channel is
always one-to-one.

*
Potential Conflicting Information in Documentation*:

On this page, http://flume.apache.org/FlumeUserGuide.html#defining-the-flow

It states that *"A source instance can specify multiple channels, but a
sink instance can only specify one channel."*


However, on this page, http://flume.apache.org/FlumeDeveloperGuide.html#sink

I noticed the following sentence:

*A Sink is associated with one or more Channels, as configured in the Flume
properties file.*


*Question and Next Steps*:

Within what context is this an accurate statement for a sink instance?

>From the context of a single agent, is this an accurate statement? If not
can I create a JIRA issue and submit a patch to correct it?

Re: Possible Conflicting Information Regarding Relationship Between Channels and Sinks within Documentation

Posted by Connor Woodson <cw...@gmail.com>.
That statement in the developer guide appears to be inaccurate; feel free
to submit a JIRA to that effect. I believe the user guide is updated much
more than the developer guide and should thus be considered more correct
(although there are outstanding issues with it I believe).

To just go back over how each agent should work:

A source can have one or more channels. Depending on the chosen channel
selector, the source will send an event to a specific channel (multiplexing
selector) or copy it to all channels (replicating selector, the default
behavior).

Each channel goes from a single Source/Selector to a single Sink Processor
(technically to a Sink Group, and then each group has a defined Processor;
but in the configuration/user guide those two are referenced as a single
thing).

The default sink processor only supports a single sink; but by manually
configuring the sink processor/group you can support multiple sinks. Each
sink in a sink processor group must pull from the same channel, hence each
sink/sink group reads from a single channel (note that the channel property
for a sink is ".channel" whereas for a source it is ".channels" with the
's'), but each channel is able to go to multiple sinks (again, a channel
goes to a sink group/processor, something which is not obvious in the
configuration; if you have two sinks that are not explicitly in the same
sink group, then you'll get an error).

- Connor


On Thu, Mar 14, 2013 at 8:50 PM, Israel Ekpo <is...@aicer.org> wrote:

> Hey guys,
>
> I have a quick question that I would like to ask based on what I found
> within the user and developer documentation.
>
> This could cause some confusion for first time folks.
>
> *Background:*
>
> From the documentation, a Flume source accepts event data and sends it into
> a channel.
>
> These event data are queued up in the channel.
>
>  A sink takes data from the channel for processing (forwarding to another
> agent's source or central repo).
>
> Furthermore, *there can be one source, one or more channels, and one or
> more sinks for each agent.  *
>
> Within an agent, *a flume source can write to multiple channels, but a sink
> can pull events from only one channel.*
>
> Hence, within this context, the relationship between a source and channel
> could be one to many but the relationship between a sink and channel is
> always one-to-one.
>
> *
> Potential Conflicting Information in Documentation*:
>
> On this page,
> http://flume.apache.org/FlumeUserGuide.html#defining-the-flow
>
> It states that *"A source instance can specify multiple channels, but a
> sink instance can only specify one channel."*
>
>
> However, on this page,
> http://flume.apache.org/FlumeDeveloperGuide.html#sink
>
> I noticed the following sentence:
>
> *A Sink is associated with one or more Channels, as configured in the Flume
> properties file.*
>
>
> *Question and Next Steps*:
>
> Within what context is this an accurate statement for a sink instance?
>
> From the context of a single agent, is this an accurate statement? If not
> can I create a JIRA issue and submit a patch to correct it?
>