You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Mohit Durgapal <du...@gmail.com> on 2014/08/14 09:25:34 UTC
how to load balance flume
I have a requirement where I need to feed push traffic(comma separated
logs) at a very high rate to flume.
I have three concerns:
1. I am using php to send events to flume through rsyslog. The code I am
using is :
*openlog("mylogs", LOG_NDELAY, LOG_LOCAL2);
syslog(LOG_INFO, "aaid,bid,cid,info1,info2,....");
closelog();*
I want to add some fields as headers in the above event log "
*aaid,bid,cid,info1,info2,....*" , I don't see any function in php
where I could add headers for some fields so that I can take some action
based on just the headers without opening the complete msg.
2. How to load balance the trafffic. I want the logger to forward the
logs to the load balancer and then the load balancer to choose a flume
node(based on various factors like current load, cpu utilization) and also
handle failures(divert traffic if a flume node goes down).
I looked at the flume based load balancer but it provides just two
options: Round Robin and Random load balancing. Any ideas as to how I could
do this load balancing with failure detection and handling would be very
helpful.
3. I want to update a cache in real-time from flume(using interceptor).
I want a hashing based approach to divert certain traffic(based on a field
or header in log) to certain nodes, so that one node is responsible for
updating rows with keys under same hash bucket. This is to avoid row level
locking.
I hope I have explained my requirements well enough for everyone to
understand. But If it's not as clear as I think, please let me know.
Regards
Mohit
Re: how to load balance flume
Posted by Sharninder <sh...@gmail.com>.
I'm not sure without looking at the exact usecase, but maybe you can use
something like haproxy?
--
Sharninder
On Thu, Aug 14, 2014 at 4:08 PM, Mohit Durgapal <du...@gmail.com>
wrote:
> Hi Sharninder,
>
> Thanks for the response. The load balancing is not based on header. To
> simplify, lets say I have one web server generating logs and three flume
> nodes receiving those logs. I want the load to be balanced on those three
> flume nodes based on cpu utilization and load.
>
>
>
>
>
> On Thu, Aug 14, 2014 at 4:01 PM, Sharninder <sh...@gmail.com> wrote:
>
>> To add headers to the events, you can either send proper avro formatted
>> packets (which have a header) to an avro source, or implement a custom
>> interceptor to add headers after they're received by the syslog source.
>> There is a static interceptor bundled with flume that you can use. The
>> problem with that is that you can only add a single header (key->value) at
>> a time, as far as I know. But, its a good starting point to do what you
>> want to do.
>>
>> I didn't really understand your load balancing requirement but if its
>> based on the headers, you'll have to write your own interceptors.
>>
>>
>>
>> On Thu, Aug 14, 2014 at 12:55 PM, Mohit Durgapal <durgapalmohit@gmail.com
>> > wrote:
>>
>>> I have a requirement where I need to feed push traffic(comma separated
>>> logs) at a very high rate to flume.
>>> I have three concerns:
>>>
>>>
>>> 1. I am using php to send events to flume through rsyslog. The code
>>> I am using is :
>>>
>>> *openlog("mylogs", LOG_NDELAY, LOG_LOCAL2);
>>> syslog(LOG_INFO, "aaid,bid,cid,info1,info2,....");
>>> closelog();*
>>>
>>> I want to add some fields as headers in the above event log "
>>> *aaid,bid,cid,info1,info2,....*" , I don't see any function in php
>>> where I could add headers for some fields so that I can take some action
>>> based on just the headers without opening the complete msg.
>>>
>>> 2. How to load balance the trafffic. I want the logger to forward
>>> the logs to the load balancer and then the load balancer to choose a flume
>>> node(based on various factors like current load, cpu utilization) and also
>>> handle failures(divert traffic if a flume node goes down).
>>>
>>> I looked at the flume based load balancer but it provides just two
>>> options: Round Robin and Random load balancing. Any ideas as to how I could
>>> do this load balancing with failure detection and handling would be very
>>> helpful.
>>>
>>> 3. I want to update a cache in real-time from flume(using
>>> interceptor). I want a hashing based approach to divert certain
>>> traffic(based on a field or header in log) to certain nodes, so that one
>>> node is responsible for updating rows with keys under same hash bucket.
>>> This is to avoid row level locking.
>>>
>>>
>>> I hope I have explained my requirements well enough for everyone to
>>> understand. But If it's not as clear as I think, please let me know.
>>>
>>>
>>> Regards
>>> Mohit
>>>
>>
>>
>
Re: how to load balance flume
Posted by Mohit Durgapal <du...@gmail.com>.
Hi Sharninder,
Thanks for the response. The load balancing is not based on header. To
simplify, lets say I have one web server generating logs and three flume
nodes receiving those logs. I want the load to be balanced on those three
flume nodes based on cpu utilization and load.
On Thu, Aug 14, 2014 at 4:01 PM, Sharninder <sh...@gmail.com> wrote:
> To add headers to the events, you can either send proper avro formatted
> packets (which have a header) to an avro source, or implement a custom
> interceptor to add headers after they're received by the syslog source.
> There is a static interceptor bundled with flume that you can use. The
> problem with that is that you can only add a single header (key->value) at
> a time, as far as I know. But, its a good starting point to do what you
> want to do.
>
> I didn't really understand your load balancing requirement but if its
> based on the headers, you'll have to write your own interceptors.
>
>
>
> On Thu, Aug 14, 2014 at 12:55 PM, Mohit Durgapal <du...@gmail.com>
> wrote:
>
>> I have a requirement where I need to feed push traffic(comma separated
>> logs) at a very high rate to flume.
>> I have three concerns:
>>
>>
>> 1. I am using php to send events to flume through rsyslog. The code I
>> am using is :
>>
>> *openlog("mylogs", LOG_NDELAY, LOG_LOCAL2);
>> syslog(LOG_INFO, "aaid,bid,cid,info1,info2,....");
>> closelog();*
>>
>> I want to add some fields as headers in the above event log "
>> *aaid,bid,cid,info1,info2,....*" , I don't see any function in php
>> where I could add headers for some fields so that I can take some action
>> based on just the headers without opening the complete msg.
>>
>> 2. How to load balance the trafffic. I want the logger to forward the
>> logs to the load balancer and then the load balancer to choose a flume
>> node(based on various factors like current load, cpu utilization) and also
>> handle failures(divert traffic if a flume node goes down).
>>
>> I looked at the flume based load balancer but it provides just two
>> options: Round Robin and Random load balancing. Any ideas as to how I could
>> do this load balancing with failure detection and handling would be very
>> helpful.
>>
>> 3. I want to update a cache in real-time from flume(using
>> interceptor). I want a hashing based approach to divert certain
>> traffic(based on a field or header in log) to certain nodes, so that one
>> node is responsible for updating rows with keys under same hash bucket.
>> This is to avoid row level locking.
>>
>>
>> I hope I have explained my requirements well enough for everyone to
>> understand. But If it's not as clear as I think, please let me know.
>>
>>
>> Regards
>> Mohit
>>
>
>
Re: how to load balance flume
Posted by Sharninder <sh...@gmail.com>.
To add headers to the events, you can either send proper avro formatted
packets (which have a header) to an avro source, or implement a custom
interceptor to add headers after they're received by the syslog source.
There is a static interceptor bundled with flume that you can use. The
problem with that is that you can only add a single header (key->value) at
a time, as far as I know. But, its a good starting point to do what you
want to do.
I didn't really understand your load balancing requirement but if its based
on the headers, you'll have to write your own interceptors.
On Thu, Aug 14, 2014 at 12:55 PM, Mohit Durgapal <du...@gmail.com>
wrote:
> I have a requirement where I need to feed push traffic(comma separated
> logs) at a very high rate to flume.
> I have three concerns:
>
>
> 1. I am using php to send events to flume through rsyslog. The code I
> am using is :
>
> *openlog("mylogs", LOG_NDELAY, LOG_LOCAL2);
> syslog(LOG_INFO, "aaid,bid,cid,info1,info2,....");
> closelog();*
>
> I want to add some fields as headers in the above event log "
> *aaid,bid,cid,info1,info2,....*" , I don't see any function in php
> where I could add headers for some fields so that I can take some action
> based on just the headers without opening the complete msg.
>
> 2. How to load balance the trafffic. I want the logger to forward the
> logs to the load balancer and then the load balancer to choose a flume
> node(based on various factors like current load, cpu utilization) and also
> handle failures(divert traffic if a flume node goes down).
>
> I looked at the flume based load balancer but it provides just two
> options: Round Robin and Random load balancing. Any ideas as to how I could
> do this load balancing with failure detection and handling would be very
> helpful.
>
> 3. I want to update a cache in real-time from flume(using
> interceptor). I want a hashing based approach to divert certain
> traffic(based on a field or header in log) to certain nodes, so that one
> node is responsible for updating rows with keys under same hash bucket.
> This is to avoid row level locking.
>
>
> I hope I have explained my requirements well enough for everyone to
> understand. But If it's not as clear as I think, please let me know.
>
>
> Regards
> Mohit
>