You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Israel Ekpo <is...@aicer.org> on 2013/05/22 23:08:09 UTC

[FLUME-1995] Remote Channel for Apache Flume

I wanted to get some feedback from others before deciding whether or not to
continue working on this.

I initially filed this improvement/new feature because of use cases where
there is a hardware failure on the machine where the agent is currently
running.

In terms of disaster recovery, having the events queue up on a remote
machine (preferably in the same internal network) will allow another agent
with the same configuration to pick it up from another machine and restart
the process of data transport towards the sink.

Sometimes, events may take a while to process and they may end up staying
in the channels (FileChannel) for a long time, during which hardware
failure could occur.

If the data in the events is mission critical, this could cause a lot of
headaches if there is no easy way to recover from the hardware failure
after events have been queued up in the file channel.

What are your thoughts towards the remote channel? I understand there is a
JDBC Channel (http://flume.apache.org/FlumeUserGuide.html#jdbc-channel) but
I have heard it has performance issues.

This is why I am deciding to use a NoSQL store to solve this.

I would like to get some feedback from others so that I can prioritize the
tasks in my JIRA queue especially with the 1.4.0 release deadline drawing
nearer.

Thanks.

Re: [FLUME-1995] Remote Channel for Apache Flume

Posted by Mike Percy <mp...@apache.org>.
Hi Israel,
Some people deploy using RAID for that. We could also add software-level mirroring support to the FC. Why not do that?

Regards,
Mike

On May 22, 2013, at 2:08 PM, Israel Ekpo <is...@aicer.org> wrote:

> I wanted to get some feedback from others before deciding whether or not to
> continue working on this.
> 
> I initially filed this improvement/new feature because of use cases where
> there is a hardware failure on the machine where the agent is currently
> running.
> 
> In terms of disaster recovery, having the events queue up on a remote
> machine (preferably in the same internal network) will allow another agent
> with the same configuration to pick it up from another machine and restart
> the process of data transport towards the sink.
> 
> Sometimes, events may take a while to process and they may end up staying
> in the channels (FileChannel) for a long time, during which hardware
> failure could occur.
> 
> If the data in the events is mission critical, this could cause a lot of
> headaches if there is no easy way to recover from the hardware failure
> after events have been queued up in the file channel.
> 
> What are your thoughts towards the remote channel? I understand there is a
> JDBC Channel (http://flume.apache.org/FlumeUserGuide.html#jdbc-channel) but
> I have heard it has performance issues.
> 
> This is why I am deciding to use a NoSQL store to solve this.
> 
> I would like to get some feedback from others so that I can prioritize the
> tasks in my JIRA queue especially with the 1.4.0 release deadline drawing
> nearer.
> 
> Thanks.