You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flume.apache.org by Evan Chan <ev...@ooyala.com> on 2011/06/26 03:31:33 UTC

Roadmap / Partitioning by key

Hi Flume community,

I hope that the incubator list is being read....  hello to everyone, I'm new
to Flume.

Is there a roadmap for future development of Flume?

I'm interested in particular to see if the ability to have a sink that can
route events to different nodes based on a key (something that Yahoo S4 can
do) will be in the roadmap, and how hard it would be to develop a feature
like that.

thanks!
Evan

-- 
--
*Evan Chan*
Senior Software Engineer |
ev@ooyala.com | (650) 996-4600
www.ooyala.com | blog <http://www.ooyala.com/blog> |
@ooyala<http://www.twitter.com/ooyala>

Re: Roadmap / Partitioning by key

Posted by Evan Chan <ev...@ooyala.com>.

Jon,

Thanks for the response, my replies are inlined.

On Sun, Jun 26, 2011 at 11:07 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:

> Evan,
>
> A basic ability to demultiplex (demux) events exists today but is only
> available for writing files to different dirs in HDFS.  The ability to do
> content-based routing for computational purposes is not currently on the
> road map.  While architecturally be possible to demux in Flume, Flume is
> currently focused on sending data from many places to a few.
>
> Can you describe your use case or what you would want to do with if you had
> this capability?  This would help us frame this discussion.
>

Let's say we wanted to do some data aggregation according to a key such as
IP address or domain name.   Since the number of keys is large, aggregation
within each node would not be very efficient unless the number of keys per
node could be reduced....  or there was some sort of really fast distributed
key cache that exists across all nodes.


>
> If there are a small finite number of  categories, demuxing could
> potentially built as plugins for today's Flume.  For something more general
> or adaptive, a larger development effort would be required.
>

I might be interested in helping with that effort, and what that entails,
but this may be a discussion more appropriate for the dev mailing list.


>
> Another approach that could be done today would be to send data from Flume
> to a system that does demux and custom routing (starting to go down the
> complex-event-processing path)..
>
> 1) Flume could potentially connect to S4 and deliver it data.  Flume could
> have a path that delivers to hdfs, and have another copy sent to S4.
>

Ah yes, S4.  At this point though, it seems Flume is more mature than S4.


> 2) Flume could send data to FlumeBase (a system built on top of Flume)
> which may (or may not) provide this capability.
>

FlumeBase doesn't have much documentation, so from what I can tell it
wouldn't have this capability.


> 3) Flume could send data to an open-source system called Esper. (I don't
> know much about it currently)
>

Esper does sound interesting, but I believe it is single-node.

thanks!
Evan


>
> Jon
>
>
> On Sat, Jun 25, 2011 at 6:31 PM, Evan Chan <ev...@ooyala.com> wrote:
>
>> Hi Flume community,
>>
>> I hope that the incubator list is being read....  hello to everyone, I'm
>> new to Flume.
>>
>> Is there a roadmap for future development of Flume?
>>
>> I'm interested in particular to see if the ability to have a sink that can
>> route events to different nodes based on a key (something that Yahoo S4 can
>> do) will be in the roadmap, and how hard it would be to develop a feature
>> like that.
>>
>> thanks!
>> Evan
>>
>> --
>> --
>> *Evan Chan*
>> Senior Software Engineer |
>> ev@ooyala.com | (650) 996-4600
>> www.ooyala.com | blog <http://www.ooyala.com/blog> | @ooyala<http://www.twitter.com/ooyala>
>>
>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>
>
>


-- 
--
*Evan Chan*
Senior Software Engineer |
ev@ooyala.com | (650) 996-4600
www.ooyala.com | blog <http://www.ooyala.com/blog> |
@ooyala<http://www.twitter.com/ooyala>

Re: Roadmap / Partitioning by key

Posted by Jonathan Hsieh <jo...@cloudera.com>.

Evan,

A basic ability to demultiplex (demux) events exists today but is only
available for writing files to different dirs in HDFS.  The ability to do
content-based routing for computational purposes is not currently on the
road map.  While architecturally be possible to demux in Flume, Flume is
currently focused on sending data from many places to a few.

Can you describe your use case or what you would want to do with if you had
this capability?  This would help us frame this discussion.

If there are a small finite number of  categories, demuxing could
potentially built as plugins for today's Flume.  For something more general
or adaptive, a larger development effort would be required.

Another approach that could be done today would be to send data from Flume
to a system that does demux and custom routing (starting to go down the
complex-event-processing path)..

1) Flume could potentially connect to S4 and deliver it data.  Flume could
have a path that delivers to hdfs, and have another copy sent to S4.
2) Flume could send data to FlumeBase (a system built on top of Flume) which
may (or may not) provide this capability.
3) Flume could send data to an open-source system called Esper. (I don't
know much about it currently)

Jon

On Sat, Jun 25, 2011 at 6:31 PM, Evan Chan <ev...@ooyala.com> wrote:

> Hi Flume community,
>
> I hope that the incubator list is being read....  hello to everyone, I'm
> new to Flume.
>
> Is there a roadmap for future development of Flume?
>
> I'm interested in particular to see if the ability to have a sink that can
> route events to different nodes based on a key (something that Yahoo S4 can
> do) will be in the roadmap, and how hard it would be to develop a feature
> like that.
>
> thanks!
> Evan
>
> --
> --
> *Evan Chan*
> Senior Software Engineer |
> ev@ooyala.com | (650) 996-4600
> www.ooyala.com | blog <http://www.ooyala.com/blog> | @ooyala<http://www.twitter.com/ooyala>
>

-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com