You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flume.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2012/01/05 23:51:09 UTC

Monitoring FlumeNG

Hi,

I'm interested in monitoring Flume NG and am wondering if one should do so by grabbing the JSONized metrics as described on this 6 months old page:
https://cwiki.apache.org/confluence/display/FLUME/Monitoring+Flume 


Or via JMX, which is mentioned here:
https://cwiki.apache.org/confluence/display/FLUME/Flume+NG#FlumeNG-CriticalFeatures 


Or wait for 1.1.0:
https://issues.apache.org/jira/browse/FLUME-749 

https://issues.apache.org/jira/browse/FLUME-748 


Thanks,
Otis

Re: Flume-NG Channels

Posted by Arvind Prabhakar <ar...@apache.org>.

Praveen,

While I agree with you that this should be a first class concept, I do feel
that it does not merit changing the event interface (to specify a category
for example).

On the other hand, the header namespace "flume.*" will be reserved for
flume internal handling and routing - so that could be considered a
standardization of sorts - making it as close to a first class concept as
possible without requiring an interface change.

Regarding your suggestion of prefix/suffix comparison - that could well be
a selector implementation. So out of the box we will have three selectors -
a replicating selector, a fixed string mapping selector, and a
prefix/suffix match selector.

In using the prefix/suffix match selector, I want to highlight Ralph's
earlier comment on the performance impact of doing any non-trivial
processing at the point of de-multiplexing. Any overhead will likely result
in performance impact that could create significant backup for the upstream
flow.  That said, I don't mind having an implementation as long as the
performance tradeoffs are well understood.

I have created FLUME-930
<https://issues.apache.org/jira/browse/FLUME-930>to track this
requirement. Lets continue the discussion on this JIRA hence
forth.

Thanks,
Arvind

On Thu, Jan 12, 2012 at 3:49 AM, Praveen Ramachandra <
praveen_ramachandra@yahoo.com> wrote:

> Awesome that works
>
> two things comes to my mind
>
> 1. From a practical point-of-view we should have the ability to have
> prefix/suffix (full regex could be an overkill) specified to map to channels
> 1.1 e.g., "instrument.*" --> "ticker channel" will map "instrument.stock"
> and instrument.mutual_fund to "ticker channel"
> 2. From a product perspective this should be a first class "concept" like
> channel, source, sink etc., some candidates that comes to my mind "flow",
> "category", "e2echannel" (yak :-), might be source of confusion  ) etc.,
> 2.1 as in event "flow", "event" category etc.,
>
>
> --
> Regards,
> Praveen Ramachandra
>
>
>    ------------------------------
> *From:* "arvind@cloudera.com" <ar...@cloudera.com>
> *To:* flume-user@incubator.apache.org
> *Sent:* Thursday, January 12, 2012 11:34 AM
> *Subject:* Re: Flume-NG Channels
>
> Point taken Ralph - avoiding a for-loop within the implementation of a
> channel selector is important for performance. In this particular case
> (that Praveen describes), the channel selector will be making a mapping
> based decision. For example:
>
> header value  --> channel
> "stock" --> "ticker channel"
> "temp" --> "weather channel"
>
> All of this information will be statically configured for the agent and so
> the selector will be able to configure itself during initialization and
> create this mapping. Once setup, when an event arrives, the lookup will be
> constant time to figure out which channel must be used (hashtable/hashmap).
>
> Do you see any issues with such implementation?
>
> Thanks,
> Arvind
>
>
> On Wed, Jan 11, 2012 at 9:24 PM, Ralph Goers <ra...@dslextreme.com>wrote:
>
> One thing I've learned from working on Log4j 2.0 is that for loops are
> actually a lot slower than you might think. In a configuration that desires
> a single channel there should be no for loop. Instead, it should go
> directly to the channel. In the case of multiple channels then the
> "channel" that is selected should be a multiplexing channel that is
> configured with other channels. The for loop (or while loop) is in the
> multiplexing channel.
>
> Thus, your ChannelSelector could (and should) in fact, be a Channel that
> can select any or all of its configured channels.
>
> FWIW, in Log4j 2 in the XML configuration you would specify
>
> <RollingFileAppender name="MainAppender" ...>
>   <MarkerFilter marker="MyMarker"/>
> </RollingFileAppender>
>
> or
>
> <RollingFileAppender" name="MainAppender" ...>
>   <filters>
>     <MarkerFilter marker="MyMarker"/>
>     <ThresholdFilter level="DEBUG"/>
>   <filters>
> </RollingFileAppender>
>
> The filters element is actually a CompositeFilter that invokes each of its
> configured filters in turn.
>
> Ralph
>
> On Jan 11, 2012, at 5:55 PM, Arvind Prabhakar wrote:
>
> Hi Praveen,
>
> Here is what I could muster up after some thought on this use-case:
>
>
>    - We modify the source interface to accept a "Channel Processor", a
>    new component that is responsible for putting the event into one or more
>    channels.
>    - A channel processor will delegate the selection of the channel to
>    place the event on via another component called "Channel Selector" which is
>    responsible for selecting the appropriate channel from the list of channels
>    the source is configured with.
>    - The default implementation of channel selector in the channel
>    processor will be a "replicating channel selector" which will result in the
>    event being copied over to all configured channels.
>    - Another implementation of the channel selector will be "Mapping
>    Channel Selector" which will allow events to be mapped to different
>    channel(s) based on the value of a specified header.
>
>
> With this facility, you will be able to inject headers into events at the
> point of origination and then configure the mapping channel selector at
> each source in the pipeline to place the event on separate channels as
> desired based on the value of the header.
>
> Do you think this will adequately address your use case? If not, what do
> you think is missing here.
>
> Thanks,
> Arvind
>
>
> On Tue, Jan 10, 2012 at 8:03 PM, Praveen Ramachandra <
> praveen_ramachandra@yahoo.com> wrote:
>
> Hi
>
> Security is not the reason for isolation.
>
> Isolation could be used to realize quite a few quality attributes of the
> system, e.g., many aspects of QoS.
>
> Regardless, if we have specific event handling requirement that are
> different for each "kind" of data the question is how do one realize it
> using flume-ng.
>
> As it stands currently, sources/sinks & channels are tied to the hip,
> which is fine. Only issue is requiring to allocate dedicated host/port to
> achieve.
>
>
> As I had mentioned in my first email, one could develop custom
> sources/sinks and configuration that goes along with to mux/demux events
> that are flowing through the system.
>
> Question to ask ourself is, why is there a need to have a change in
> deployment to accommodate a new "flow" in the system.
>
>
>
> --
> Regards,
> Praveen Ramachandra
>
>   ------------------------------
> *From:* Ralph Goers <ra...@dslextreme.com>
> *To:* flume-user@incubator.apache.org
> *Sent:* Tuesday, January 10, 2012 6:01 PM
> *Subject:* Re: Flume-NG Channels
>
> When you speak of flow isolation are you doing that for security, failure
> protection or for some other reason?  From a failure protection case you
> would need physically different Flume agents, not just channels. I'm not
> sure what the security gains are in isolation, if any.
>
> I guess to give you a proper response I would want to know what your
> actual requirements are and possibly why.
>
> For what its worth, I also work in a multi-tenant environment and this has
> never been a requirement.
>
> Ralph
>
>
>
> On Jan 10, 2012, at 12:42 AM, Praveen Ramachandra wrote:
>
> Hi arvind,
>
> Thanks for responding.
>
> if we want to model separation not only in transit but also at rest i.e.,
> if channel has a filechannel/jdbcchannel/memorychannel backing separation
> is required when data resides in those channels before they are shipped to
> the next hop.
>
> on multi-tenant, I was trying to figure out from isolation perspective.
> Flow isolation is required from one collecting agent tier, to aggregating
> agent-tier and a tier that is going to deposit/deliver the events.
>
> "How do you propose the platform be modified in order to support this
> use-case?" you ask, Thinking out loud now :-).
> One option is to have a notion of a flow that is visible at flume-ng
> level, applications will map channels to flows and sources/sinks across
> agent tiers, can mux/demux it appropriately.
>
> This will also decouple mapping across agent tiers i.e.,
>
> If you smell scribe in my above description, I wouldn't hold it against
> you :-). Honestly the simplicity of scribe let us prototype for our use
> case in a matter of hour or two, compared to many days that it took to get
> almost similar thing prototyped with flume. We even struggle today to model
> the use cases seamlessly in flume (og or ng).
>
>
> --
> Regards,
> Praveen Ramachandra
>
>
>
>
>   ------------------------------
> *From:* Arvind Prabhakar <ar...@apache.org>
> *To:* flume-user@incubator.apache.org; Praveen Ramachandra <
> praveen_ramachandra@yahoo.com>
> *Sent:* Monday, January 9, 2012 11:15 PM
> *Subject:* Re: Flume-NG Channels
>
> Hi Praveen,
>
> First to your question:
>
> > Did I get the modeling right with flume-ng
>
> More-or-less yes. The one distinction that I would like to point out
> is that having separate source-sink end points for individual channels
> is stemming more from your requirement than by design of flume. A
> channel in flume implementation does not care how many sources write
> to it or how many sink's read from it.
>
> > 2. Is there a better way to do it at a platform level
> >             2.1 I know if I can write a bunch of custom sinks/sources and
> > embed a notion of channel to which each events belong to in the message,
> I
> > can effectively mux and demux the events at either ends.
>
> The key issue here is the layering of a multi-tenant semantic on top
> of flows. Since fundamentally flume is not aware of the contents of
> the events in a flow, and does not expose any client auth/id model -
> there is no inherent support of doing this out of the box.
>
> Moreover, from your description it seems that the channels that
> logically separate out the flows will operate within the same agent.
> If that is the case, then it may be a better option to use a single
> channel and have a multiplexing terminal sink that can route the
> messages to the correct destination.
>
> >             2.2 Which means the default support for channel is also not
> of
> > much use
>
> How do you propose the platform be modified in order to support this
> use-case?
>
> Thanks,
> Arvind
>
>
>
> On Mon, Jan 9, 2012 at 9:36 PM, Praveen Ramachandra
> <pr...@yahoo.com> wrote:
> > They are in low 100's in the best case scenario, and could be in 1000 in
> the
> > worst case scenario.
> >
> > I believe this aspect can be pretty much shielded from application if the
> > underlying platform has the right set of responsibilities.
> >
> >
> > --
> > Regards,
> > Praveen Ramachandra
> >
> >
> >
> > ________________________________
> > From: Ralph Goers <ra...@dslextreme.com>
> > To: flume-user@incubator.apache.org
> > Sent: Monday, January 9, 2012 6:53 PM
> > Subject: Re: Flume-NG Channels
> >
> >
> > On Jan 9, 2012, at 2:28 AM, Praveen Ramachandra wrote:
> >
> > Hi,
> >
> > We were trying to design a multi-tenanted system using flume-ng, where
> each
> > logically independent data set is modelled through a channel going
> through
> > the system of collectors, aggregators and delivery agents (to end
> > destination). Each channel will carry data that logically belong
> together.
> > The requirement is that we should be able to bring up and tear down a
> > channel with ease.
> >
> >
> > When we completed the exercise, it turned out that we have to run a
> separate
> > Source/Sink, at a designated host/port combination for each channel. The
> > issue with this is that, it is an operational overhead that we have work
> > with net-ops to punch holes in the firewall to let tcp traffic flow on
> > non-standard ports. I would imagine that it would be the case in many
> > organizations as well.
> >
> > Two questions.
> >
> > 1. Did I get the modeling right with flume-ng
> > 2. Is there a better way to do it at a platform level
> >             2.1 I know if I can write a bunch of custom sinks/sources and
> > embed a notion of channel to which each events belong to in the message,
> I
> > can effectively mux and demux the events at either ends.
> >             2.2 Which means the default support for channel is also not
> of
> > much use
> >
> >
> > What is your target destination(s) for the tenants?  Can they all flow
> > through a single channel in Flume and then be delivered to the correct
> > destination by a smarter sink at the end?
> >
> > Ralph
> >
> >
>
>
>
>
>
>
>
>
>
>
>

Re: Flume-NG Channels

Posted by Praveen Ramachandra <pr...@yahoo.com>.

Awesome that works

two things comes to my mind

1. From a practical point-of-view we should have the ability to have prefix/suffix (full regex could be an overkill) specified to map to channels
1.1 e.g., "instrument.*" --> "ticker channel" will map "instrument.stock" and instrument.mutual_fund to "ticker channel"
2. From a product perspective this should be a first class "concept" like channel, source, sink etc., some candidates that comes to my mind "flow", "category", "e2echannel" (yak :-), might be source of confusion  ) etc.,
2.1 as in event "flow", "event" category etc.,
 

--
Regards,
Praveen Ramachandra



________________________________
 From: "arvind@cloudera.com" <ar...@cloudera.com>
To: flume-user@incubator.apache.org 
Sent: Thursday, January 12, 2012 11:34 AM
Subject: Re: Flume-NG Channels
 

Point taken Ralph - avoiding a for-loop within the implementation of a channel selector is important for performance. In this particular case (that Praveen describes), the channel selector will be making a mapping based decision. For example:

header value  --> channel
"stock" --> "ticker channel"
"temp" --> "weather channel"

All of this information will be statically configured for the agent and so the selector will be able to configure itself during initialization and create this mapping. Once setup, when an event arrives, the lookup will be constant time to figure out which channel must be used (hashtable/hashmap).

Do you see any issues with such implementation?

Thanks,
Arvind



On Wed, Jan 11, 2012 at 9:24 PM, Ralph Goers <ra...@dslextreme.com> wrote:

One thing I've learned from working on Log4j 2.0 is that for loops are actually a lot slower than you might think. In a configuration that desires a single channel there should be no for loop. Instead, it should go directly to the channel. In the case of multiple channels then the "channel" that is selected should be a multiplexing channel that is configured with other channels. The for loop (or while loop) is in the multiplexing channel.
>
>
>Thus, your ChannelSelector could (and should) in fact, be a Channel that can select any or all of its configured channels.
>
>
>FWIW, in Log4j 2 in the XML configuration you would specify
>
>
><RollingFileAppender name="MainAppender" ...>
>  <MarkerFilter marker="MyMarker"/>
></RollingFileAppender>
>
>
>or 
>
>
><RollingFileAppender" name="MainAppender" ...>
>  <filters>
>    <MarkerFilter marker="MyMarker"/>
>    <ThresholdFilter level="DEBUG"/>
>  <filters>
></RollingFileAppender>
>
>
>The filters element is actually a CompositeFilter that invokes each of its configured filters in turn.
>
>Ralph
>
>
>On Jan 11, 2012, at 5:55 PM, Arvind Prabhakar wrote:
>
>Hi Praveen,
>>
>>
>>Here is what I could muster up after some thought on this use-case:
>>
>>
>>	* We modify the source interface to accept a "Channel Processor", a new component that is responsible for putting the event into one or more channels. 
>>	* A channel processor will delegate the selection of the channel to place the event on via another component called "Channel Selector" which is responsible for selecting the appropriate channel from the list of channels the source is configured with.
>>	* The default implementation of channel selector in the channel processor will be a "replicating channel selector" which will result in the event being copied over to all configured channels.
>>	* Another implementation of the channel selector will be "Mapping Channel Selector" which will allow events to be mapped to different channel(s) based on the value of a specified header.
>>
>>
>>With this facility, you will be able to inject headers into events at the point of origination and then configure the mapping channel selector at each source in the pipeline to place the event on separate channels as desired based on the value of the header.
>>
>>
>>Do you think this will adequately address your use case? If not, what do you think is missing here.
>>
>>
>>Thanks,
>>Arvind
>>
>>
>>On Tue, Jan 10, 2012 at 8:03 PM, Praveen Ramachandra <pr...@yahoo.com> wrote:
>>
>>Hi
>>>
>>>
>>>Security is not the reason for isolation.
>>>
>>>
>>>Isolation could be used to realize quite a few quality attributes of the system, e.g., many aspects of QoS.
>>>
>>>
>>>Regardless, if we have specific event handling requirement that are different for each "kind" of data the question is how do one realize it using flume-ng.
>>>
>>>
>>>As it stands currently, sources/sinks & channels are tied to the hip, which is fine. Only issue is requiring to allocate dedicated host/port to achieve. 
>>>
>>>
>>>
>>>
>>>As I had mentioned in my first email, one could develop custom sources/sinks and configuration that goes along with to mux/demux events that are flowing through the system. 
>>>
>>>
>>>Question to ask ourself is, why is there a need to have a change in deployment to accommodate a new "flow" in the system.
>>>
>>>
>>>
>>>
>>>
>>>
>>>--
>>>Regards,
>>>Praveen Ramachandra
>>>
>>>
>>>
>>>________________________________
>>> From: Ralph Goers <ra...@dslextreme.com>
>>>To: flume-user@incubator.apache.org 
>>>Sent: Tuesday, January 10, 2012 6:01 PM
>>>Subject: Re: Flume-NG Channels
>>> 
>>>
>>>
>>>When you speak of flow isolation are you doing that for security, failure protection or for some other reason?  From a failure protection case you would need physically different Flume agents, not just channels. I'm not sure what the security gains are in isolation, if any.
>>>
>>>
>>>I guess to give you a proper response I would want to know what your actual requirements are and possibly why. 
>>>
>>>
>>>For what its worth, I also work in a multi-tenant environment and this has never been a requirement. 
>>>
>>>
>>>Ralph
>>>
>>>
>>>
>>>
>>>
>>>
>>>On Jan 10, 2012, at 12:42 AM, Praveen Ramachandra wrote:
>>>
>>>Hi arvind,
>>>>
>>>>
>>>>Thanks for responding.
>>>>
>>>>
>>>>if we want to model separation not only in transit but also at rest i.e., if channel has a filechannel/jdbcchannel/memorychannel backing separation is required when data resides in those channels before they are shipped to the next hop.
>>>>
>>>>
>>>>on multi-tenant, I was trying to figure out from isolation perspective. Flow isolation is required from one collecting agent tier, to aggregating agent-tier and a tier that is going to deposit/deliver the events.
>>>>
>>>>
>>>>"How do you propose the platform be modified in order to support this use-case?" you ask, Thinking out loud now :-).
>>>>
>>>>One option is to have a notion of a flow that is visible at flume-ng level, applications will map channels to flows and sources/sinks across agent tiers, can mux/demux it appropriately.
>>>>
>>>>
>>>>This will also decouple mapping across agent tiers i.e., 
>>>>
>>>>
>>>>
>>>>If you smell scribe in my above description, I wouldn't hold it against you :-). Honestly the simplicity of scribe let us prototype for our use case in a matter of hour or two, compared to many days that it took to get almost similar thing prototyped with flume. We even struggle today to model the use cases seamlessly in flume (og or ng).
>>>>
>>>>
>>>>
>>>>
>>>>--
>>>>Regards,
>>>>Praveen Ramachandra
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>________________________________
>>>> From: Arvind Prabhakar <ar...@apache.org>
>>>>To: flume-user@incubator.apache.org; Praveen Ramachandra <pr...@yahoo.com> 
>>>>Sent: Monday, January 9, 2012 11:15 PM
>>>>Subject: Re: Flume-NG Channels
>>>> 
>>>>Hi Praveen,
>>>>
>>>>First to your question:
>>>>
>>>>> Did I get the modeling right with flume-ng
>>>>
>>>>More-or-less yes. The one distinction that I would like to point out
>>>>is that having separate source-sink end points for individual channels
>>>>is stemming more from your requirement than by design of flume. A
>>>>channel in flume implementation does not care how many sources write
>>>>to it or how many sink's read from it.
>>>>
>>>>> 2. Is there a better way to do it at a platform level
>>>>>             2.1 I know if I can write a bunch of custom sinks/sources and
>>>>> embed a notion of channel to which each events belong to in the message, I
>>>>> can effectively mux and demux the events at either ends.
>>>>
>>>>The key issue here is the layering of a multi-tenant semantic on top
>>>>of flows. Since fundamentally flume is not aware of the contents of
>>>>the events in a flow, and does not expose any client auth/id
 model -
>>>>there is no inherent support of doing this out of the box.
>>>>
>>>>Moreover, from your description it seems that the channels that
>>>>logically separate out the flows will operate within the same agent.
>>>>If that is the case, then it may be a better option to use a single
>>>>channel and have a multiplexing terminal sink that can route the
>>>>messages to the correct destination.
>>>>
>>>>>             2.2 Which means the default support for channel is also not of
>>>>> much use
>>>>
>>>>How do you propose the platform be modified in order to support this use-case?
>>>>
>>>>Thanks,
>>>>Arvind
>>>>
>>>>
>>>>
>>>>On Mon, Jan 9, 2012 at 9:36 PM, Praveen Ramachandra
>>>><pr...@yahoo.com> wrote:
>>>>> They are in low 100's in the best case scenario, and could be in 1000 in
 the
>>>>> worst case
 scenario.
>>>>>
>>>>> I believe this aspect can be pretty much shielded from application if the
>>>>> underlying platform has the right set of responsibilities.
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Praveen Ramachandra
>>>>>
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: Ralph Goers <ra...@dslextreme.com>
>>>>> To: flume-user@incubator.apache.org
>>>>> Sent: Monday, January 9, 2012 6:53 PM
>>>>> Subject: Re: Flume-NG Channels
>>>>>
>>>>>
>>>>> On Jan 9, 2012, at 2:28 AM, Praveen Ramachandra wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> We were trying to design a multi-tenanted system using flume-ng, where each
>>>>> logically
 independent data set is modelled through a channel going
 through
>>>>> the system of collectors, aggregators and delivery agents (to end
>>>>> destination). Each channel will carry data that logically belong together.
>>>>> The requirement is that we should be able to bring up and tear down a
>>>>> channel with ease.
>>>>>
>>>>>
>>>>> When we completed the exercise, it turned out that we have to run a separate
>>>>> Source/Sink, at a designated host/port combination for each channel. The
>>>>> issue with this is that, it is an operational overhead that we have work
>>>>> with net-ops to punch holes in the firewall to let tcp traffic flow on
>>>>> non-standard ports. I would imagine that it would be the case in many
>>>>> organizations as well.
>>>>>
>>>>> Two questions.
>>>>>
>>>>> 1. Did I get the modeling right with flume-ng
>>>>> 2. Is there a better way to do it at a platform level
>>>>>             2.1 I know if I can write a bunch of
 custom sinks/sources and
>>>>> embed a notion of channel to which each events belong to in the message, I
>>>>> can effectively mux and demux the events at either ends.
>>>>>             2.2 Which means the default support for channel is also not of
>>>>> much use
>>>>>
>>>>>
>>>>> What is your target destination(s) for the tenants?  Can they all flow
>>>>> through a single channel in Flume and then be delivered to the correct
>>>>> destination by a smarter sink at the end?
>>>>>
>>>>> Ralph
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>

Re: Flume-NG Channels

Posted by "arvind@cloudera.com" <ar...@cloudera.com>.

Point taken Ralph - avoiding a for-loop within the implementation of a
channel selector is important for performance. In this particular case
(that Praveen describes), the channel selector will be making a mapping
based decision. For example:

header value  --> channel
"stock" --> "ticker channel"
"temp" --> "weather channel"

All of this information will be statically configured for the agent and so
the selector will be able to configure itself during initialization and
create this mapping. Once setup, when an event arrives, the lookup will be
constant time to figure out which channel must be used (hashtable/hashmap).

Do you see any issues with such implementation?

Thanks,
Arvind


On Wed, Jan 11, 2012 at 9:24 PM, Ralph Goers <ra...@dslextreme.com>wrote:

> One thing I've learned from working on Log4j 2.0 is that for loops are
> actually a lot slower than you might think. In a configuration that desires
> a single channel there should be no for loop. Instead, it should go
> directly to the channel. In the case of multiple channels then the
> "channel" that is selected should be a multiplexing channel that is
> configured with other channels. The for loop (or while loop) is in the
> multiplexing channel.
>
> Thus, your ChannelSelector could (and should) in fact, be a Channel that
> can select any or all of its configured channels.
>
> FWIW, in Log4j 2 in the XML configuration you would specify
>
> <RollingFileAppender name="MainAppender" ...>
>   <MarkerFilter marker="MyMarker"/>
> </RollingFileAppender>
>
> or
>
> <RollingFileAppender" name="MainAppender" ...>
>   <filters>
>     <MarkerFilter marker="MyMarker"/>
>     <ThresholdFilter level="DEBUG"/>
>   <filters>
> </RollingFileAppender>
>
> The filters element is actually a CompositeFilter that invokes each of its
> configured filters in turn.
>
> Ralph
>
> On Jan 11, 2012, at 5:55 PM, Arvind Prabhakar wrote:
>
> Hi Praveen,
>
> Here is what I could muster up after some thought on this use-case:
>
>
>    - We modify the source interface to accept a "Channel Processor", a
>    new component that is responsible for putting the event into one or more
>    channels.
>    - A channel processor will delegate the selection of the channel to
>    place the event on via another component called "Channel Selector" which is
>    responsible for selecting the appropriate channel from the list of channels
>    the source is configured with.
>    - The default implementation of channel selector in the channel
>    processor will be a "replicating channel selector" which will result in the
>    event being copied over to all configured channels.
>    - Another implementation of the channel selector will be "Mapping
>    Channel Selector" which will allow events to be mapped to different
>    channel(s) based on the value of a specified header.
>
>
> With this facility, you will be able to inject headers into events at the
> point of origination and then configure the mapping channel selector at
> each source in the pipeline to place the event on separate channels as
> desired based on the value of the header.
>
> Do you think this will adequately address your use case? If not, what do
> you think is missing here.
>
> Thanks,
> Arvind
>
>
> On Tue, Jan 10, 2012 at 8:03 PM, Praveen Ramachandra <
> praveen_ramachandra@yahoo.com> wrote:
>
>> Hi
>>
>> Security is not the reason for isolation.
>>
>> Isolation could be used to realize quite a few quality attributes of the
>> system, e.g., many aspects of QoS.
>>
>> Regardless, if we have specific event handling requirement that are
>> different for each "kind" of data the question is how do one realize it
>> using flume-ng.
>>
>> As it stands currently, sources/sinks & channels are tied to the hip,
>> which is fine. Only issue is requiring to allocate dedicated host/port to
>> achieve.
>>
>>
>> As I had mentioned in my first email, one could develop custom
>> sources/sinks and configuration that goes along with to mux/demux events
>> that are flowing through the system.
>>
>> Question to ask ourself is, why is there a need to have a change in
>> deployment to accommodate a new "flow" in the system.
>>
>>
>>
>> --
>> Regards,
>> Praveen Ramachandra
>>
>>   ------------------------------
>> *From:* Ralph Goers <ra...@dslextreme.com>
>> *To:* flume-user@incubator.apache.org
>> *Sent:* Tuesday, January 10, 2012 6:01 PM
>> *Subject:* Re: Flume-NG Channels
>>
>> When you speak of flow isolation are you doing that for security, failure
>> protection or for some other reason?  From a failure protection case you
>> would need physically different Flume agents, not just channels. I'm not
>> sure what the security gains are in isolation, if any.
>>
>> I guess to give you a proper response I would want to know what your
>> actual requirements are and possibly why.
>>
>> For what its worth, I also work in a multi-tenant environment and this
>> has never been a requirement.
>>
>> Ralph
>>
>>
>>
>> On Jan 10, 2012, at 12:42 AM, Praveen Ramachandra wrote:
>>
>> Hi arvind,
>>
>> Thanks for responding.
>>
>> if we want to model separation not only in transit but also at rest i.e.,
>> if channel has a filechannel/jdbcchannel/memorychannel backing separation
>> is required when data resides in those channels before they are shipped to
>> the next hop.
>>
>> on multi-tenant, I was trying to figure out from isolation perspective.
>> Flow isolation is required from one collecting agent tier, to aggregating
>> agent-tier and a tier that is going to deposit/deliver the events.
>>
>> "How do you propose the platform be modified in order to support this
>> use-case?" you ask, Thinking out loud now :-).
>> One option is to have a notion of a flow that is visible at flume-ng
>> level, applications will map channels to flows and sources/sinks across
>> agent tiers, can mux/demux it appropriately.
>>
>> This will also decouple mapping across agent tiers i.e.,
>>
>> If you smell scribe in my above description, I wouldn't hold it against
>> you :-). Honestly the simplicity of scribe let us prototype for our use
>> case in a matter of hour or two, compared to many days that it took to get
>> almost similar thing prototyped with flume. We even struggle today to model
>> the use cases seamlessly in flume (og or ng).
>>
>>
>> --
>> Regards,
>> Praveen Ramachandra
>>
>>
>>
>>
>>   ------------------------------
>> *From:* Arvind Prabhakar <ar...@apache.org>
>> *To:* flume-user@incubator.apache.org; Praveen Ramachandra <
>> praveen_ramachandra@yahoo.com>
>> *Sent:* Monday, January 9, 2012 11:15 PM
>> *Subject:* Re: Flume-NG Channels
>>
>> Hi Praveen,
>>
>> First to your question:
>>
>> > Did I get the modeling right with flume-ng
>>
>> More-or-less yes. The one distinction that I would like to point out
>> is that having separate source-sink end points for individual channels
>> is stemming more from your requirement than by design of flume. A
>> channel in flume implementation does not care how many sources write
>> to it or how many sink's read from it.
>>
>> > 2. Is there a better way to do it at a platform level
>> >             2.1 I know if I can write a bunch of custom sinks/sources
>> and
>> > embed a notion of channel to which each events belong to in the
>> message, I
>> > can effectively mux and demux the events at either ends.
>>
>> The key issue here is the layering of a multi-tenant semantic on top
>> of flows. Since fundamentally flume is not aware of the contents of
>> the events in a flow, and does not expose any client auth/id model -
>> there is no inherent support of doing this out of the box.
>>
>> Moreover, from your description it seems that the channels that
>> logically separate out the flows will operate within the same agent.
>> If that is the case, then it may be a better option to use a single
>> channel and have a multiplexing terminal sink that can route the
>> messages to the correct destination.
>>
>> >             2.2 Which means the default support for channel is also not
>> of
>> > much use
>>
>> How do you propose the platform be modified in order to support this
>> use-case?
>>
>> Thanks,
>> Arvind
>>
>>
>>
>> On Mon, Jan 9, 2012 at 9:36 PM, Praveen Ramachandra
>> <pr...@yahoo.com> wrote:
>> > They are in low 100's in the best case scenario, and could be in 1000
>> in the
>> > worst case scenario.
>> >
>> > I believe this aspect can be pretty much shielded from application if
>> the
>> > underlying platform has the right set of responsibilities.
>> >
>> >
>> > --
>> > Regards,
>> > Praveen Ramachandra
>> >
>> >
>> >
>> > ________________________________
>> > From: Ralph Goers <ra...@dslextreme.com>
>> > To: flume-user@incubator.apache.org
>> > Sent: Monday, January 9, 2012 6:53 PM
>> > Subject: Re: Flume-NG Channels
>> >
>> >
>> > On Jan 9, 2012, at 2:28 AM, Praveen Ramachandra wrote:
>> >
>> > Hi,
>> >
>> > We were trying to design a multi-tenanted system using flume-ng, where
>> each
>> > logically independent data set is modelled through a channel going
>> through
>> > the system of collectors, aggregators and delivery agents (to end
>> > destination). Each channel will carry data that logically belong
>> together.
>> > The requirement is that we should be able to bring up and tear down a
>> > channel with ease.
>> >
>> >
>> > When we completed the exercise, it turned out that we have to run a
>> separate
>> > Source/Sink, at a designated host/port combination for each channel. The
>> > issue with this is that, it is an operational overhead that we have work
>> > with net-ops to punch holes in the firewall to let tcp traffic flow on
>> > non-standard ports. I would imagine that it would be the case in many
>> > organizations as well.
>> >
>> > Two questions.
>> >
>> > 1. Did I get the modeling right with flume-ng
>> > 2. Is there a better way to do it at a platform level
>> >             2.1 I know if I can write a bunch of custom sinks/sources
>> and
>> > embed a notion of channel to which each events belong to in the
>> message, I
>> > can effectively mux and demux the events at either ends.
>> >             2.2 Which means the default support for channel is also not
>> of
>> > much use
>> >
>> >
>> > What is your target destination(s) for the tenants?  Can they all flow
>> > through a single channel in Flume and then be delivered to the correct
>> > destination by a smarter sink at the end?
>> >
>> > Ralph
>> >
>> >
>>
>>
>>
>>
>>
>>
>
>

Re: Flume-NG Channels

Posted by Ralph Goers <ra...@dslextreme.com>.

One thing I've learned from working on Log4j 2.0 is that for loops are actually a lot slower than you might think. In a configuration that desires a single channel there should be no for loop. Instead, it should go directly to the channel. In the case of multiple channels then the "channel" that is selected should be a multiplexing channel that is configured with other channels. The for loop (or while loop) is in the multiplexing channel.

Thus, your ChannelSelector could (and should) in fact, be a Channel that can select any or all of its configured channels.

FWIW, in Log4j 2 in the XML configuration you would specify

<RollingFileAppender name="MainAppender" ...>
  <MarkerFilter marker="MyMarker"/>
</RollingFileAppender>

or 

<RollingFileAppender" name="MainAppender" ...>
  <filters>
    <MarkerFilter marker="MyMarker"/>
    <ThresholdFilter level="DEBUG"/>
  <filters>
</RollingFileAppender>

The filters element is actually a CompositeFilter that invokes each of its configured filters in turn.

Ralph

On Jan 11, 2012, at 5:55 PM, Arvind Prabhakar wrote:

> Hi Praveen,
> 
> Here is what I could muster up after some thought on this use-case:
> 
> We modify the source interface to accept a "Channel Processor", a new component that is responsible for putting the event into one or more channels. 
> A channel processor will delegate the selection of the channel to place the event on via another component called "Channel Selector" which is responsible for selecting the appropriate channel from the list of channels the source is configured with.
> The default implementation of channel selector in the channel processor will be a "replicating channel selector" which will result in the event being copied over to all configured channels.
> Another implementation of the channel selector will be "Mapping Channel Selector" which will allow events to be mapped to different channel(s) based on the value of a specified header.
> 
> With this facility, you will be able to inject headers into events at the point of origination and then configure the mapping channel selector at each source in the pipeline to place the event on separate channels as desired based on the value of the header.
> 
> Do you think this will adequately address your use case? If not, what do you think is missing here.
> 
> Thanks,
> Arvind
> 
> 
> On Tue, Jan 10, 2012 at 8:03 PM, Praveen Ramachandra <pr...@yahoo.com> wrote:
> Hi
> 
> Security is not the reason for isolation.
> 
> Isolation could be used to realize quite a few quality attributes of the system, e.g., many aspects of QoS.
> 
> Regardless, if we have specific event handling requirement that are different for each "kind" of data the question is how do one realize it using flume-ng.
> 
> As it stands currently, sources/sinks & channels are tied to the hip, which is fine. Only issue is requiring to allocate dedicated host/port to achieve. 
> 
> 
> As I had mentioned in my first email, one could develop custom sources/sinks and configuration that goes along with to mux/demux events that are flowing through the system. 
> 
> Question to ask ourself is, why is there a need to have a change in deployment to accommodate a new "flow" in the system.
> 
> 
> 
> --
> Regards,
> Praveen Ramachandra
> 
> From: Ralph Goers <ra...@dslextreme.com>
> To: flume-user@incubator.apache.org 
> Sent: Tuesday, January 10, 2012 6:01 PM
> Subject: Re: Flume-NG Channels
> 
> When you speak of flow isolation are you doing that for security, failure protection or for some other reason?  From a failure protection case you would need physically different Flume agents, not just channels. I'm not sure what the security gains are in isolation, if any.
> 
> I guess to give you a proper response I would want to know what your actual requirements are and possibly why. 
> 
> For what its worth, I also work in a multi-tenant environment and this has never been a requirement. 
> 
> Ralph
> 
> 
> 
> On Jan 10, 2012, at 12:42 AM, Praveen Ramachandra wrote:
> 
>> Hi arvind,
>> 
>> Thanks for responding.
>> 
>> if we want to model separation not only in transit but also at rest i.e., if channel has a filechannel/jdbcchannel/memorychannel backing separation is required when data resides in those channels before they are shipped to the next hop.
>> 
>> on multi-tenant, I was trying to figure out from isolation perspective. Flow isolation is required from one collecting agent tier, to aggregating agent-tier and a tier that is going to deposit/deliver the events.
>> 
>> "How do you propose the platform be modified in order to support this use-case?" you ask, Thinking out loud now :-).
>> One option is to have a notion of a flow that is visible at flume-ng level, applications will map channels to flows and sources/sinks across agent tiers, can mux/demux it appropriately.
>> 
>> This will also decouple mapping across agent tiers i.e., 
>> 
>> If you smell scribe in my above description, I wouldn't hold it against you :-). Honestly the simplicity of scribe let us prototype for our use case in a matter of hour or two, compared to many days that it took to get almost similar thing prototyped with flume. We even struggle today to model the use cases seamlessly in flume (og or ng).
>> 
>> 
>> --
>> Regards,
>> Praveen Ramachandra
>> 
>> 
>> 
>> 
>> From: Arvind Prabhakar <ar...@apache.org>
>> To: flume-user@incubator.apache.org; Praveen Ramachandra <pr...@yahoo.com> 
>> Sent: Monday, January 9, 2012 11:15 PM
>> Subject: Re: Flume-NG Channels
>> 
>> Hi Praveen,
>> 
>> First to your question:
>> 
>> > Did I get the modeling right with flume-ng
>> 
>> More-or-less yes. The one distinction that I would like to point out
>> is that having separate source-sink end points for individual channels
>> is stemming more from your requirement than by design of flume. A
>> channel in flume implementation does not care how many sources write
>> to it or how many sink's read from it.
>> 
>> > 2. Is there a better way to do it at a platform level
>> >             2.1 I know if I can write a bunch of custom sinks/sources and
>> > embed a notion of channel to which each events belong to in the message, I
>> > can effectively mux and demux the events at either ends.
>> 
>> The key issue here is the layering of a multi-tenant semantic on top
>> of flows. Since fundamentally flume is not aware of the contents of
>> the events in a flow, and does not expose any client auth/id model -
>> there is no inherent support of doing this out of the box.
>> 
>> Moreover, from your description it seems that the channels that
>> logically separate out the flows will operate within the same agent.
>> If that is the case, then it may be a better option to use a single
>> channel and have a multiplexing terminal sink that can route the
>> messages to the correct destination.
>> 
>> >             2.2 Which means the default support for channel is also not of
>> > much use
>> 
>> How do you propose the platform be modified in order to support this use-case?
>> 
>> Thanks,
>> Arvind
>> 
>> 
>> 
>> On Mon, Jan 9, 2012 at 9:36 PM, Praveen Ramachandra
>> <pr...@yahoo.com> wrote:
>> > They are in low 100's in the best case scenario, and could be in 1000 in the
>> > worst case scenario.
>> >
>> > I believe this aspect can be pretty much shielded from application if the
>> > underlying platform has the right set of responsibilities.
>> >
>> >
>> > --
>> > Regards,
>> > Praveen Ramachandra
>> >
>> >
>> >
>> > ________________________________
>> > From: Ralph Goers <ra...@dslextreme.com>
>> > To: flume-user@incubator.apache.org
>> > Sent: Monday, January 9, 2012 6:53 PM
>> > Subject: Re: Flume-NG Channels
>> >
>> >
>> > On Jan 9, 2012, at 2:28 AM, Praveen Ramachandra wrote:
>> >
>> > Hi,
>> >
>> > We were trying to design a multi-tenanted system using flume-ng, where each
>> > logically independent data set is modelled through a channel going through
>> > the system of collectors, aggregators and delivery agents (to end
>> > destination). Each channel will carry data that logically belong together.
>> > The requirement is that we should be able to bring up and tear down a
>> > channel with ease.
>> >
>> >
>> > When we completed the exercise, it turned out that we have to run a separate
>> > Source/Sink, at a designated host/port combination for each channel. The
>> > issue with this is that, it is an operational overhead that we have work
>> > with net-ops to punch holes in the firewall to let tcp traffic flow on
>> > non-standard ports. I would imagine that it would be the case in many
>> > organizations as well.
>> >
>> > Two questions.
>> >
>> > 1. Did I get the modeling right with flume-ng
>> > 2. Is there a better way to do it at a platform level
>> >             2.1 I know if I can write a bunch of custom sinks/sources and
>> > embed a notion of channel to which each events belong to in the message, I
>> > can effectively mux and demux the events at either ends.
>> >             2.2 Which means the default support for channel is also not of
>> > much use
>> >
>> >
>> > What is your target destination(s) for the tenants?  Can they all flow
>> > through a single channel in Flume and then be delivered to the correct
>> > destination by a smarter sink at the end?
>> >
>> > Ralph
>> >
>> >
>> 
>> 
> 
> 
> 
>

Re: Flume-NG Channels

Posted by Arvind Prabhakar <ar...@apache.org>.

Hi Praveen,

Here is what I could muster up after some thought on this use-case:


   - We modify the source interface to accept a "Channel Processor", a new
   component that is responsible for putting the event into one or more
   channels.
   - A channel processor will delegate the selection of the channel to
   place the event on via another component called "Channel Selector" which is
   responsible for selecting the appropriate channel from the list of channels
   the source is configured with.
   - The default implementation of channel selector in the channel
   processor will be a "replicating channel selector" which will result in the
   event being copied over to all configured channels.
   - Another implementation of the channel selector will be "Mapping
   Channel Selector" which will allow events to be mapped to different
   channel(s) based on the value of a specified header.


With this facility, you will be able to inject headers into events at the
point of origination and then configure the mapping channel selector at
each source in the pipeline to place the event on separate channels as
desired based on the value of the header.

Do you think this will adequately address your use case? If not, what do
you think is missing here.

Thanks,
Arvind


On Tue, Jan 10, 2012 at 8:03 PM, Praveen Ramachandra <
praveen_ramachandra@yahoo.com> wrote:

> Hi
>
> Security is not the reason for isolation.
>
> Isolation could be used to realize quite a few quality attributes of the
> system, e.g., many aspects of QoS.
>
> Regardless, if we have specific event handling requirement that are
> different for each "kind" of data the question is how do one realize it
> using flume-ng.
>
> As it stands currently, sources/sinks & channels are tied to the hip,
> which is fine. Only issue is requiring to allocate dedicated host/port to
> achieve.
>
>
> As I had mentioned in my first email, one could develop custom
> sources/sinks and configuration that goes along with to mux/demux events
> that are flowing through the system.
>
> Question to ask ourself is, why is there a need to have a change in
> deployment to accommodate a new "flow" in the system.
>
>
>
> --
> Regards,
> Praveen Ramachandra
>
>   ------------------------------
> *From:* Ralph Goers <ra...@dslextreme.com>
> *To:* flume-user@incubator.apache.org
> *Sent:* Tuesday, January 10, 2012 6:01 PM
> *Subject:* Re: Flume-NG Channels
>
> When you speak of flow isolation are you doing that for security, failure
> protection or for some other reason?  From a failure protection case you
> would need physically different Flume agents, not just channels. I'm not
> sure what the security gains are in isolation, if any.
>
> I guess to give you a proper response I would want to know what your
> actual requirements are and possibly why.
>
> For what its worth, I also work in a multi-tenant environment and this has
> never been a requirement.
>
> Ralph
>
>
>
> On Jan 10, 2012, at 12:42 AM, Praveen Ramachandra wrote:
>
> Hi arvind,
>
> Thanks for responding.
>
> if we want to model separation not only in transit but also at rest i.e.,
> if channel has a filechannel/jdbcchannel/memorychannel backing separation
> is required when data resides in those channels before they are shipped to
> the next hop.
>
> on multi-tenant, I was trying to figure out from isolation perspective.
> Flow isolation is required from one collecting agent tier, to aggregating
> agent-tier and a tier that is going to deposit/deliver the events.
>
> "How do you propose the platform be modified in order to support this
> use-case?" you ask, Thinking out loud now :-).
> One option is to have a notion of a flow that is visible at flume-ng
> level, applications will map channels to flows and sources/sinks across
> agent tiers, can mux/demux it appropriately.
>
> This will also decouple mapping across agent tiers i.e.,
>
> If you smell scribe in my above description, I wouldn't hold it against
> you :-). Honestly the simplicity of scribe let us prototype for our use
> case in a matter of hour or two, compared to many days that it took to get
> almost similar thing prototyped with flume. We even struggle today to model
> the use cases seamlessly in flume (og or ng).
>
>
> --
> Regards,
> Praveen Ramachandra
>
>
>
>
>   ------------------------------
> *From:* Arvind Prabhakar <ar...@apache.org>
> *To:* flume-user@incubator.apache.org; Praveen Ramachandra <
> praveen_ramachandra@yahoo.com>
> *Sent:* Monday, January 9, 2012 11:15 PM
> *Subject:* Re: Flume-NG Channels
>
> Hi Praveen,
>
> First to your question:
>
> > Did I get the modeling right with flume-ng
>
> More-or-less yes. The one distinction that I would like to point out
> is that having separate source-sink end points for individual channels
> is stemming more from your requirement than by design of flume. A
> channel in flume implementation does not care how many sources write
> to it or how many sink's read from it.
>
> > 2. Is there a better way to do it at a platform level
> >             2.1 I know if I can write a bunch of custom sinks/sources and
> > embed a notion of channel to which each events belong to in the message,
> I
> > can effectively mux and demux the events at either ends.
>
> The key issue here is the layering of a multi-tenant semantic on top
> of flows. Since fundamentally flume is not aware of the contents of
> the events in a flow, and does not expose any client auth/id model -
> there is no inherent support of doing this out of the box.
>
> Moreover, from your description it seems that the channels that
> logically separate out the flows will operate within the same agent.
> If that is the case, then it may be a better option to use a single
> channel and have a multiplexing terminal sink that can route the
> messages to the correct destination.
>
> >             2.2 Which means the default support for channel is also not
> of
> > much use
>
> How do you propose the platform be modified in order to support this
> use-case?
>
> Thanks,
> Arvind
>
>
>
> On Mon, Jan 9, 2012 at 9:36 PM, Praveen Ramachandra
> <pr...@yahoo.com> wrote:
> > They are in low 100's in the best case scenario, and could be in 1000 in
> the
> > worst case scenario.
> >
> > I believe this aspect can be pretty much shielded from application if the
> > underlying platform has the right set of responsibilities.
> >
> >
> > --
> > Regards,
> > Praveen Ramachandra
> >
> >
> >
> > ________________________________
> > From: Ralph Goers <ra...@dslextreme.com>
> > To: flume-user@incubator.apache.org
> > Sent: Monday, January 9, 2012 6:53 PM
> > Subject: Re: Flume-NG Channels
> >
> >
> > On Jan 9, 2012, at 2:28 AM, Praveen Ramachandra wrote:
> >
> > Hi,
> >
> > We were trying to design a multi-tenanted system using flume-ng, where
> each
> > logically independent data set is modelled through a channel going
> through
> > the system of collectors, aggregators and delivery agents (to end
> > destination). Each channel will carry data that logically belong
> together.
> > The requirement is that we should be able to bring up and tear down a
> > channel with ease.
> >
> >
> > When we completed the exercise, it turned out that we have to run a
> separate
> > Source/Sink, at a designated host/port combination for each channel. The
> > issue with this is that, it is an operational overhead that we have work
> > with net-ops to punch holes in the firewall to let tcp traffic flow on
> > non-standard ports. I would imagine that it would be the case in many
> > organizations as well.
> >
> > Two questions.
> >
> > 1. Did I get the modeling right with flume-ng
> > 2. Is there a better way to do it at a platform level
> >             2.1 I know if I can write a bunch of custom sinks/sources and
> > embed a notion of channel to which each events belong to in the message,
> I
> > can effectively mux and demux the events at either ends.
> >             2.2 Which means the default support for channel is also not
> of
> > much use
> >
> >
> > What is your target destination(s) for the tenants?  Can they all flow
> > through a single channel in Flume and then be delivered to the correct
> > destination by a smarter sink at the end?
> >
> > Ralph
> >
> >
>
>
>
>
>
>

Re: Flume-NG Channels

Posted by Praveen Ramachandra <pr...@yahoo.com>.

Hi

Security is not the reason for isolation.

Isolation could be used to realize quite a few quality attributes of the system, e.g., many aspects of QoS.

Regardless, if we have specific event handling requirement that are different for each "kind" of data the question is how do one realize it using flume-ng.

As it stands currently, sources/sinks & channels are tied to the hip, which is fine. Only issue is requiring to allocate dedicated host/port to achieve. 


As I had mentioned in my first email, one could develop custom sources/sinks and configuration that goes along with to mux/demux events that are flowing through the system. 

Question to ask ourself is, why is there a need to have a change in deployment to accommodate a new "flow" in the system.



--
Regards,
Praveen Ramachandra


________________________________
 From: Ralph Goers <ra...@dslextreme.com>
To: flume-user@incubator.apache.org 
Sent: Tuesday, January 10, 2012 6:01 PM
Subject: Re: Flume-NG Channels
 

When you speak of flow isolation are you doing that for security, failure protection or for some other reason?  From a failure protection case you would need physically different Flume agents, not just channels. I'm not sure what the security gains are in isolation, if any.

I guess to give you a proper response I would want to know what your actual requirements are and possibly why. 

For what its worth, I also work in a multi-tenant environment and this has never been a requirement. 

Ralph




On Jan 10, 2012, at 12:42 AM, Praveen Ramachandra wrote:

Hi arvind,
>
>
>Thanks for responding.
>
>
>if we want to model separation not only in transit but also at rest i.e., if channel has a filechannel/jdbcchannel/memorychannel backing separation is required when data resides in those channels before they are shipped to the next hop.
>
>
>on multi-tenant, I was trying to figure out from isolation perspective. Flow isolation is required from one collecting agent tier, to aggregating agent-tier and a tier that is going to deposit/deliver the events.
>
>
>"How do you propose the platform be modified in order to support this use-case?" you ask, Thinking out loud now :-).
>
>One option is to have a notion of a flow that is visible at flume-ng level, applications will map channels to flows and sources/sinks across agent tiers, can mux/demux it appropriately.
>
>
>This will also decouple mapping across agent tiers i.e., 
>
>
>
>If you smell scribe in my above description, I wouldn't hold it against you :-). Honestly the simplicity of scribe let us prototype for our use case in a matter of hour or two, compared to many days that it took to get almost similar thing prototyped with flume. We even struggle today to model the use cases seamlessly in flume (og or ng).
>
>
>
>
>--
>Regards,
>Praveen Ramachandra
>
>
>
>
>
>
>
>
>
>________________________________
> From: Arvind Prabhakar <ar...@apache.org>
>To: flume-user@incubator.apache.org; Praveen Ramachandra <pr...@yahoo.com> 
>Sent: Monday, January 9, 2012 11:15 PM
>Subject: Re: Flume-NG Channels
> 
>Hi Praveen,
>
>First to your question:
>
>> Did I get the modeling right with flume-ng
>
>More-or-less yes. The one distinction that I would like to point out
>is that having separate source-sink end points for individual channels
>is stemming more from your requirement than by design of flume. A
>channel in flume implementation does not care how many sources write
>to it or how many sink's read from it.
>
>> 2. Is there a better way to do it at a platform level
>>             2.1 I know if I can write a bunch of custom sinks/sources and
>> embed a notion of channel to which each events belong to in the message, I
>> can effectively mux and demux the events at either ends.
>
>The key issue here is the layering of a multi-tenant semantic on top
>of flows. Since fundamentally flume is not aware of the contents of
>the events in a flow, and does not expose any client auth/id
 model -
>there is no inherent support of doing this out of the box.
>
>Moreover, from your description it seems that the channels that
>logically separate out the flows will operate within the same agent.
>If that is the case, then it may be a better option to use a single
>channel and have a multiplexing terminal sink that can route the
>messages to the correct destination.
>
>>             2.2 Which means the default support for channel is also not of
>> much use
>
>How do you propose the platform be modified in order to support this use-case?
>
>Thanks,
>Arvind
>
>
>
>On Mon, Jan 9, 2012 at 9:36 PM, Praveen Ramachandra
><pr...@yahoo.com> wrote:
>> They are in low 100's in the best case scenario, and could be in 1000 in the
>> worst case
 scenario.
>>
>> I believe this aspect can be pretty much shielded from application if the
>> underlying platform has the right set of responsibilities.
>>
>>
>> --
>> Regards,
>> Praveen Ramachandra
>>
>>
>>
>> ________________________________
>> From: Ralph Goers <ra...@dslextreme.com>
>> To: flume-user@incubator.apache.org
>> Sent: Monday, January 9, 2012 6:53 PM
>> Subject: Re: Flume-NG Channels
>>
>>
>> On Jan 9, 2012, at 2:28 AM, Praveen Ramachandra wrote:
>>
>> Hi,
>>
>> We were trying to design a multi-tenanted system using flume-ng, where each
>> logically independent data set is modelled through a channel going
 through
>> the system of collectors, aggregators and delivery agents (to end
>> destination). Each channel will carry data that logically belong together.
>> The requirement is that we should be able to bring up and tear down a
>> channel with ease.
>>
>>
>> When we completed the exercise, it turned out that we have to run a separate
>> Source/Sink, at a designated host/port combination for each channel. The
>> issue with this is that, it is an operational overhead that we have work
>> with net-ops to punch holes in the firewall to let tcp traffic flow on
>> non-standard ports. I would imagine that it would be the case in many
>> organizations as well.
>>
>> Two questions.
>>
>> 1. Did I get the modeling right with flume-ng
>> 2. Is there a better way to do it at a platform level
>>             2.1 I know if I can write a bunch of
 custom sinks/sources and
>> embed a notion of channel to which each events belong to in the message, I
>> can effectively mux and demux the events at either ends.
>>             2.2 Which means the default support for channel is also not of
>> much use
>>
>>
>> What is your target destination(s) for the tenants?  Can they all flow
>> through a single channel in Flume and then be delivered to the correct
>> destination by a smarter sink at the end?
>>
>> Ralph
>>
>>
>
>
>

Re: Flume-NG Channels

Posted by Ralph Goers <ra...@dslextreme.com>.

When you speak of flow isolation are you doing that for security, failure protection or for some other reason?  From a failure protection case you would need physically different Flume agents, not just channels. I'm not sure what the security gains are in isolation, if any.

I guess to give you a proper response I would want to know what your actual requirements are and possibly why. 

For what its worth, I also work in a multi-tenant environment and this has never been a requirement. 

Ralph



On Jan 10, 2012, at 12:42 AM, Praveen Ramachandra wrote:

> Hi arvind,
> 
> Thanks for responding.
> 
> if we want to model separation not only in transit but also at rest i.e., if channel has a filechannel/jdbcchannel/memorychannel backing separation is required when data resides in those channels before they are shipped to the next hop.
> 
> on multi-tenant, I was trying to figure out from isolation perspective. Flow isolation is required from one collecting agent tier, to aggregating agent-tier and a tier that is going to deposit/deliver the events.
> 
> "How do you propose the platform be modified in order to support this use-case?" you ask, Thinking out loud now :-).
> One option is to have a notion of a flow that is visible at flume-ng level, applications will map channels to flows and sources/sinks across agent tiers, can mux/demux it appropriately.
> 
> This will also decouple mapping across agent tiers i.e., 
> 
> If you smell scribe in my above description, I wouldn't hold it against you :-). Honestly the simplicity of scribe let us prototype for our use case in a matter of hour or two, compared to many days that it took to get almost similar thing prototyped with flume. We even struggle today to model the use cases seamlessly in flume (og or ng).
> 
> 
> --
> Regards,
> Praveen Ramachandra
> 
> 
> 
> 
> From: Arvind Prabhakar <ar...@apache.org>
> To: flume-user@incubator.apache.org; Praveen Ramachandra <pr...@yahoo.com> 
> Sent: Monday, January 9, 2012 11:15 PM
> Subject: Re: Flume-NG Channels
> 
> Hi Praveen,
> 
> First to your question:
> 
> > Did I get the modeling right with flume-ng
> 
> More-or-less yes. The one distinction that I would like to point out
> is that having separate source-sink end points for individual channels
> is stemming more from your requirement than by design of flume. A
> channel in flume implementation does not care how many sources write
> to it or how many sink's read from it.
> 
> > 2. Is there a better way to do it at a platform level
> >             2.1 I know if I can write a bunch of custom sinks/sources and
> > embed a notion of channel to which each events belong to in the message, I
> > can effectively mux and demux the events at either ends.
> 
> The key issue here is the layering of a multi-tenant semantic on top
> of flows. Since fundamentally flume is not aware of the contents of
> the events in a flow, and does not expose any client auth/id model -
> there is no inherent support of doing this out of the box.
> 
> Moreover, from your description it seems that the channels that
> logically separate out the flows will operate within the same agent.
> If that is the case, then it may be a better option to use a single
> channel and have a multiplexing terminal sink that can route the
> messages to the correct destination.
> 
> >             2.2 Which means the default support for channel is also not of
> > much use
> 
> How do you propose the platform be modified in order to support this use-case?
> 
> Thanks,
> Arvind
> 
> 
> 
> On Mon, Jan 9, 2012 at 9:36 PM, Praveen Ramachandra
> <pr...@yahoo.com> wrote:
> > They are in low 100's in the best case scenario, and could be in 1000 in the
> > worst case scenario.
> >
> > I believe this aspect can be pretty much shielded from application if the
> > underlying platform has the right set of responsibilities.
> >
> >
> > --
> > Regards,
> > Praveen Ramachandra
> >
> >
> >
> > ________________________________
> > From: Ralph Goers <ra...@dslextreme.com>
> > To: flume-user@incubator.apache.org
> > Sent: Monday, January 9, 2012 6:53 PM
> > Subject: Re: Flume-NG Channels
> >
> >
> > On Jan 9, 2012, at 2:28 AM, Praveen Ramachandra wrote:
> >
> > Hi,
> >
> > We were trying to design a multi-tenanted system using flume-ng, where each
> > logically independent data set is modelled through a channel going through
> > the system of collectors, aggregators and delivery agents (to end
> > destination). Each channel will carry data that logically belong together.
> > The requirement is that we should be able to bring up and tear down a
> > channel with ease.
> >
> >
> > When we completed the exercise, it turned out that we have to run a separate
> > Source/Sink, at a designated host/port combination for each channel. The
> > issue with this is that, it is an operational overhead that we have work
> > with net-ops to punch holes in the firewall to let tcp traffic flow on
> > non-standard ports. I would imagine that it would be the case in many
> > organizations as well.
> >
> > Two questions.
> >
> > 1. Did I get the modeling right with flume-ng
> > 2. Is there a better way to do it at a platform level
> >             2.1 I know if I can write a bunch of custom sinks/sources and
> > embed a notion of channel to which each events belong to in the message, I
> > can effectively mux and demux the events at either ends.
> >             2.2 Which means the default support for channel is also not of
> > much use
> >
> >
> > What is your target destination(s) for the tenants?  Can they all flow
> > through a single channel in Flume and then be delivered to the correct
> > destination by a smarter sink at the end?
> >
> > Ralph
> >
> >
> 
>

Re: Flume-NG Channels

Posted by Praveen Ramachandra <pr...@yahoo.com>.

Small correction.

If you smell scribe's notion of "category" in my above description, I wouldn't hold it against you :-). 

________________________________
 From: Praveen Ramachandra <pr...@yahoo.com>
To: "flume-user@incubator.apache.org" <fl...@incubator.apache.org>; "arvind@cloudera.com" <ar...@cloudera.com> 
Sent: Tuesday, January 10, 2012 2:12 PM
Subject: Re: Flume-NG Channels

Hi arvind,

Thanks for responding.

if we want to model separation not only in transit but also at rest i.e., if channel has a filechannel/jdbcchannel/memorychannel backing separation is required when data resides in those channels before they are shipped to the next hop.

on multi-tenant, I was trying to figure out from isolation perspective. Flow isolation is required from one collecting agent tier, to aggregating agent-tier and a tier that is going to deposit/deliver the events.

"How do you propose the platform be modified in order to support this use-case?" you ask, Thinking out loud now :-).

One option is to have a notion of a flow that is visible at flume-ng level, applications will map channels to flows and sources/sinks across agent tiers, can mux/demux it appropriately.

This will also decouple mapping across agent tiers i.e., 

If you smell scribe in my above description, I wouldn't hold it against you :-). Honestly the simplicity of scribe let us prototype for our use case in a matter of hour or two, compared to many days that it took to get almost similar thing prototyped with flume. We even struggle today to model the use cases seamlessly in flume (og or ng).

--
Regards,
Praveen Ramachandra

________________________________
 From: Arvind Prabhakar <ar...@apache.org>
To: flume-user@incubator.apache.org; Praveen Ramachandra <pr...@yahoo.com> 
Sent: Monday, January 9, 2012 11:15 PM
Subject: Re: Flume-NG Channels

Hi Praveen,

First to your question:

> Did I get the modeling right with flume-ng

More-or-less yes. The one distinction that I would like to point out
is that having separate source-sink end points for individual channels
is stemming more from your requirement than by design of flume. A
channel in flume implementation does not care how many sources write
to it or how many sink's read from it.

> 2. Is there a better way to do it at a platform level
>             2.1 I know if I can write a bunch of custom sinks/sources and
> embed a notion of channel to which each events belong to in the message, I
> can effectively mux and demux the events at either ends.

The key issue here is the layering of a multi-tenant semantic on top
of flows. Since fundamentally flume is not aware of the contents of
the events in a flow, and does not expose any client auth/id
 model -
there is no inherent support of doing this out of the box.

Moreover, from your description it seems that the channels that
logically separate out the flows will operate within the same agent.
If that is the case, then it may be a better option to use a single
channel and have a multiplexing terminal sink that can route the
messages to the correct destination.

>             2.2 Which means the default support for channel is also not of
> much use

How do you propose the platform be modified in order to support this use-case?

Thanks,
Arvind

On Mon, Jan 9, 2012 at 9:36 PM, Praveen Ramachandra
<pr...@yahoo.com> wrote:
> They are in low 100's in the best case scenario, and could be in 1000 in the
> worst case
 scenario.
>
> I believe this aspect can be pretty much shielded from application if the
> underlying platform has the right set of responsibilities.
>
>
> --
> Regards,
> Praveen Ramachandra
>
>
>
> ________________________________
> From: Ralph Goers <ra...@dslextreme.com>
> To: flume-user@incubator.apache.org
> Sent: Monday, January 9, 2012 6:53 PM
> Subject: Re: Flume-NG Channels
>
>
> On Jan 9, 2012, at 2:28 AM, Praveen Ramachandra wrote:
>
> Hi,
>
> We were trying to design a multi-tenanted system using flume-ng, where each
> logically independent data set is modelled through a channel going
 through
> the system of collectors, aggregators and delivery agents (to end
> destination). Each channel will carry data that logically belong together.
> The requirement is that we should be able to bring up and tear down a
> channel with ease.
>
>
> When we completed the exercise, it turned out that we have to run a separate
> Source/Sink, at a designated host/port combination for each channel. The
> issue with this is that, it is an operational overhead that we have work
> with net-ops to punch holes in the firewall to let tcp traffic flow on
> non-standard ports. I would imagine that it would be the case in many
> organizations as well.
>
> Two questions.
>
> 1. Did I get the modeling right with flume-ng
> 2. Is there a better way to do it at a platform level
>             2.1 I know if I can write a bunch of
 custom sinks/sources and
> embed a notion of channel to which each events belong to in the message, I
> can effectively mux and demux the events at either ends.
>             2.2 Which means the default support for channel is also not of
> much use
>
>
> What is your target destination(s) for the tenants?  Can they all flow
> through a single channel in Flume and then be delivered to the correct
> destination by a smarter sink at the end?
>
> Ralph
>
>

Re: Flume-NG Channels

Posted by Praveen Ramachandra <pr...@yahoo.com>.

Hi arvind,

Thanks for responding.

if we want to model separation not only in transit but also at rest i.e., if channel has a filechannel/jdbcchannel/memorychannel backing separation is required when data resides in those channels before they are shipped to the next hop.

on multi-tenant, I was trying to figure out from isolation perspective. Flow isolation is required from one collecting agent tier, to aggregating agent-tier and a tier that is going to deposit/deliver the events.

"How do you propose the platform be modified in order to support this use-case?" you ask, Thinking out loud now :-).

One option is to have a notion of a flow that is visible at flume-ng level, applications will map channels to flows and sources/sinks across agent tiers, can mux/demux it appropriately.

This will also decouple mapping across agent tiers i.e., 

If you smell scribe in my above description, I wouldn't hold it against you :-). Honestly the simplicity of scribe let us prototype for our use case in a matter of hour or two, compared to many days that it took to get almost similar thing prototyped with flume. We even struggle today to model the use cases seamlessly in flume (og or ng).

--
Regards,
Praveen Ramachandra

________________________________
 From: Arvind Prabhakar <ar...@apache.org>
To: flume-user@incubator.apache.org; Praveen Ramachandra <pr...@yahoo.com> 
Sent: Monday, January 9, 2012 11:15 PM
Subject: Re: Flume-NG Channels

Hi Praveen,

First to your question:

> Did I get the modeling right with flume-ng

More-or-less yes. The one distinction that I would like to point out
is that having separate source-sink end points for individual channels
is stemming more from your requirement than by design of flume. A
channel in flume implementation does not care how many sources write
to it or how many sink's read from it.

> 2. Is there a better way to do it at a platform level
>             2.1 I know if I can write a bunch of custom sinks/sources and
> embed a notion of channel to which each events belong to in the message, I
> can effectively mux and demux the events at either ends.

The key issue here is the layering of a multi-tenant semantic on top
of flows. Since fundamentally flume is not aware of the contents of
the events in a flow, and does not expose any client auth/id model -
there is no inherent support of doing this out of the box.

Moreover, from your description it seems that the channels that
logically separate out the flows will operate within the same agent.
If that is the case, then it may be a better option to use a single
channel and have a multiplexing terminal sink that can route the
messages to the correct destination.

>             2.2 Which means the default support for channel is also not of
> much use

How do you propose the platform be modified in order to support this use-case?

Thanks,
Arvind

On Mon, Jan 9, 2012 at 9:36 PM, Praveen Ramachandra
<pr...@yahoo.com> wrote:
> They are in low 100's in the best case scenario, and could be in 1000 in the
> worst case scenario.
>
> I believe this aspect can be pretty much shielded from application if the
> underlying platform has the right set of responsibilities.
>
>
> --
> Regards,
> Praveen Ramachandra
>
>
>
> ________________________________
> From: Ralph Goers <ra...@dslextreme.com>
> To: flume-user@incubator.apache.org
> Sent: Monday, January 9, 2012 6:53 PM
> Subject: Re: Flume-NG Channels
>
>
> On Jan 9, 2012, at 2:28 AM, Praveen Ramachandra wrote:
>
> Hi,
>
> We were trying to design a multi-tenanted system using flume-ng, where each
> logically independent data set is modelled through a channel going through
> the system of collectors, aggregators and delivery agents (to end
> destination). Each channel will carry data that logically belong together.
> The requirement is that we should be able to bring up and tear down a
> channel with ease.
>
>
> When we completed the exercise, it turned out that we have to run a separate
> Source/Sink, at a designated host/port combination for each channel. The
> issue with this is that, it is an operational overhead that we have work
> with net-ops to punch holes in the firewall to let tcp traffic flow on
> non-standard ports. I would imagine that it would be the case in many
> organizations as well.
>
> Two questions.
>
> 1. Did I get the modeling right with flume-ng
> 2. Is there a better way to do it at a platform level
>             2.1 I know if I can write a bunch of custom sinks/sources and
> embed a notion of channel to which each events belong to in the message, I
> can effectively mux and demux the events at either ends.
>             2.2 Which means the default support for channel is also not of
> much use
>
>
> What is your target destination(s) for the tenants?  Can they all flow
> through a single channel in Flume and then be delivered to the correct
> destination by a smarter sink at the end?
>
> Ralph
>
>

Re: Flume-NG Channels

Posted by Arvind Prabhakar <ar...@apache.org>.

Hi Praveen,

First to your question:

> Did I get the modeling right with flume-ng

More-or-less yes. The one distinction that I would like to point out
is that having separate source-sink end points for individual channels
is stemming more from your requirement than by design of flume. A
channel in flume implementation does not care how many sources write
to it or how many sink's read from it.

> 2. Is there a better way to do it at a platform level
>             2.1 I know if I can write a bunch of custom sinks/sources and
> embed a notion of channel to which each events belong to in the message, I
> can effectively mux and demux the events at either ends.

The key issue here is the layering of a multi-tenant semantic on top
of flows. Since fundamentally flume is not aware of the contents of
the events in a flow, and does not expose any client auth/id model -
there is no inherent support of doing this out of the box.

Moreover, from your description it seems that the channels that
logically separate out the flows will operate within the same agent.
If that is the case, then it may be a better option to use a single
channel and have a multiplexing terminal sink that can route the
messages to the correct destination.

>             2.2 Which means the default support for channel is also not of
> much use

How do you propose the platform be modified in order to support this use-case?

Thanks,
Arvind



On Mon, Jan 9, 2012 at 9:36 PM, Praveen Ramachandra
<pr...@yahoo.com> wrote:
> They are in low 100's in the best case scenario, and could be in 1000 in the
> worst case scenario.
>
> I believe this aspect can be pretty much shielded from application if the
> underlying platform has the right set of responsibilities.
>
>
> --
> Regards,
> Praveen Ramachandra
>
>
>
> ________________________________
> From: Ralph Goers <ra...@dslextreme.com>
> To: flume-user@incubator.apache.org
> Sent: Monday, January 9, 2012 6:53 PM
> Subject: Re: Flume-NG Channels
>
>
> On Jan 9, 2012, at 2:28 AM, Praveen Ramachandra wrote:
>
> Hi,
>
> We were trying to design a multi-tenanted system using flume-ng, where each
> logically independent data set is modelled through a channel going through
> the system of collectors, aggregators and delivery agents (to end
> destination). Each channel will carry data that logically belong together.
> The requirement is that we should be able to bring up and tear down a
> channel with ease.
>
>
> When we completed the exercise, it turned out that we have to run a separate
> Source/Sink, at a designated host/port combination for each channel. The
> issue with this is that, it is an operational overhead that we have work
> with net-ops to punch holes in the firewall to let tcp traffic flow on
> non-standard ports. I would imagine that it would be the case in many
> organizations as well.
>
> Two questions.
>
> 1. Did I get the modeling right with flume-ng
> 2. Is there a better way to do it at a platform level
>             2.1 I know if I can write a bunch of custom sinks/sources and
> embed a notion of channel to which each events belong to in the message, I
> can effectively mux and demux the events at either ends.
>             2.2 Which means the default support for channel is also not of
> much use
>
>
> What is your target destination(s) for the tenants?  Can they all flow
> through a single channel in Flume and then be delivered to the correct
> destination by a smarter sink at the end?
>
> Ralph
>
>

Re: Flume-NG Channels

Posted by Praveen Ramachandra <pr...@yahoo.com>.

They are in low 100's in the best case scenario, and could be in 1000 in the worst case scenario.

I believe this aspect can be pretty much shielded from application if the underlying platform has the right set of responsibilities.


--
Regards,
Praveen Ramachandra





________________________________
 From: Ralph Goers <ra...@dslextreme.com>
To: flume-user@incubator.apache.org 
Sent: Monday, January 9, 2012 6:53 PM
Subject: Re: Flume-NG Channels
 



On Jan 9, 2012, at 2:28 AM, Praveen Ramachandra wrote:

Hi,
>
>
>We were trying to design a multi-tenanted system using flume-ng, where each logically independent data set is modelled through a channel going through the system of collectors, aggregators and delivery agents (to end destination). Each channel will carry data that logically belong together. The requirement is that we should be able to bring up and tear down a channel with ease.
>
>
>
>
>When we completed the exercise, it turned out that we have to run a separate Source/Sink, at a designated host/port combination for each channel. The issue with this is that, it is an operational overhead that we have work with net-ops to punch holes in the firewall to let tcp traffic flow on non-standard ports. I would imagine that it would be the case in many organizations as well.
>
>
>Two questions.
>
>
>1. Did I get the modeling right with flume-ng
>2. Is there a better way to do it at a platform level
>            2.1 I know if I can write a bunch of custom sinks/sources and embed a notion of channel to which each events belong to in the message, I can effectively mux and demux the events at either ends.
>            2.2 Which means the default support for channel is also not of much use

What is your target destination(s) for the tenants?  Can they all flow through a single channel in Flume and then be delivered to the correct destination by a smarter sink at the end?

Ralph

Re: Flume-NG Channels

Posted by Ralph Goers <ra...@dslextreme.com>.

On Jan 9, 2012, at 2:28 AM, Praveen Ramachandra wrote:

> Hi,
> 
> We were trying to design a multi-tenanted system using flume-ng, where each logically independent data set is modelled through a channel going through the system of collectors, aggregators and delivery agents (to end destination). Each channel will carry data that logically belong together. The requirement is that we should be able to bring up and tear down a channel with ease.
> 
> 
> When we completed the exercise, it turned out that we have to run a separate Source/Sink, at a designated host/port combination for each channel. The issue with this is that, it is an operational overhead that we have work with net-ops to punch holes in the firewall to let tcp traffic flow on non-standard ports. I would imagine that it would be the case in many organizations as well.
> 
> Two questions.
> 
> 1. Did I get the modeling right with flume-ng
> 2. Is there a better way to do it at a platform level
>             2.1 I know if I can write a bunch of custom sinks/sources and embed a notion of channel to which each events belong to in the message, I can effectively mux and demux the events at either ends.
>             2.2 Which means the default support for channel is also not of much use

What is your target destination(s) for the tenants?  Can they all flow through a single channel in Flume and then be delivered to the correct destination by a smarter sink at the end?

Ralph

Flume-NG Channels

Posted by Praveen Ramachandra <pr...@yahoo.com>.

Hi,

We were trying to design a multi-tenanted system using flume-ng, where each logically independent data set is modelled through a channel going through the system of collectors, aggregators and delivery agents (to end destination). Each channel will carry data that logically belong together. The requirement is that we should be able to bring up and tear down a channel with ease.


When we completed the exercise, it turned out that we have to run a separate Source/Sink, at a designated host/port combination for each channel. The issue with this is that, it is an operational overhead that we have work with net-ops to punch holes in the firewall to let tcp traffic flow on non-standard ports. I would imagine that it would be the case in many organizations as well.

Two questions.

1. Did I get the modeling right with flume-ng
2. Is there a better way to do it at a platform level
            2.1 I know if I can write a bunch of custom sinks/sources and embed a notion of channel to which each events belong to in the message, I can effectively mux and demux the events at either ends.
            2.2 Which means the default support for channel is also not of much use

Re: Monitoring FlumeNG

Posted by Eric Sammer <es...@cloudera.com>.

Otis:

In 1.0.0 there isn't support (yet) for either JMX or JSON-ized metrics,
unfortunately. I just didn't have the time so it got pushed to 1.1.0. Still
not sure which will happen (whatever is easier) but that's the plan. For
now, there isn't a good way to extract metrics from NG (at least not until
you kill the daemon and it spits metrics info to the logs which isn't
really helpful except for debugging / testing).

On Thu, Jan 5, 2012 at 2:51 PM, Otis Gospodnetic <otis_gospodnetic@yahoo.com
> wrote:

> Hi,
>
> I'm interested in monitoring Flume NG and am wondering if one should do so
> by grabbing the JSONized metrics as described on this 6 months old page:
> https://cwiki.apache.org/confluence/display/FLUME/Monitoring+Flume
>
> Or via JMX, which is mentioned here:
>
> https://cwiki.apache.org/confluence/display/FLUME/Flume+NG#FlumeNG-CriticalFeatures
>
>
> Or wait for 1.1.0:
> https://issues.apache.org/jira/browse/FLUME-749
> https://issues.apache.org/jira/browse/FLUME-748
>
> Thanks,
> Otis
>
>

-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com