You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Sharninder <sh...@gmail.com> on 2014/10/16 06:45:02 UTC

Flume Syslog source

Hi Guys,

I'm trying to implement a system to archive syslogs using flume. I've
played around with it a bit but haven't really been able to figure out a
way to segregate logs according to the host they're coming from? Is there a
way for me to add the hostname to the event header somehow? I can then use
either an interceptor to read the header or even a custom sink to deal with
events based on the hostname.

--
Sharninder

Re: Flume Syslog source

Posted by Sharninder <sh...@gmail.com>.
Yes, I did think of that but that just seems like a hack and doesn't scale
too much.


Ideally, I should be able to just look at the remote host from the tcp
headers somewhere and add that info to the flume event header.

--
Sharninder


On Thu, Oct 16, 2014 at 10:17 AM, Hari Shreedharan <
hshreedharan@cloudera.com> wrote:

> The Multiport syslog source can add the port number on which the data was
> received to the event headers. You can use with a multiplexing channel
> selector to separate this to different channels.
>
> Thanks,
> Hari
>
>
> On Wed, Oct 15, 2014 at 9:45 PM, Sharninder <sh...@gmail.com> wrote:
>
>> Hi Guys,
>>
>> I'm trying to implement a system to archive syslogs using flume. I've
>> played around with it a bit but haven't really been able to figure out a
>> way to segregate logs according to the host they're coming from? Is there a
>> way for me to add the hostname to the event header somehow? I can then use
>> either an interceptor to read the header or even a custom sink to deal with
>> events based on the hostname.
>>
>> --
>> Sharninder
>>
>>
>

Re: Flume Syslog source

Posted by Sharninder <sh...@gmail.com>.
Thanks Jeff. I'll take a look at the multipart source too.

On Thu, Oct 16, 2014 at 8:52 PM, Jeff Lord <jl...@cloudera.com> wrote:

> You will get better perf out of the multiport syslog source
>
>
> On Wednesday, October 15, 2014, Sharninder <sh...@gmail.com> wrote:
>
>> I just looked at the existing syslogtcp source and it seems it does take
>> pains to parse the hostname from the message and I think that is the best
>> bet for me. Ofcourse, it might fail for a few devices, but I'll just have
>> to think of something else for those.
>>
>> --
>> Sharninder
>>
>>
>> On Thu, Oct 16, 2014 at 10:40 AM, Sharninder <sh...@gmail.com>
>> wrote:
>>
>>> Yes Jeff. That's a possiblity but I'm not sure (actually pretty sure)
>>> that there would be a some random device which will not send their logs in
>>> the proper format and my regex will break. This is the way I'll implement
>>> it if I can't find anything better.
>>>
>>> Thanks,
>>> Sharninder
>>>
>>>
>>>
>>> On Thu, Oct 16, 2014 at 10:22 AM, Jeff Lord <jl...@cloudera.com> wrote:
>>>
>>>> You can also use a regex interceptor to extract hostname from the
>>>> message (assuming it's there) and put that in an event header. From there
>>>> you can route and create partitions with the header.
>>>>
>>>>
>>>> On Wednesday, October 15, 2014, Hari Shreedharan <
>>>> hshreedharan@cloudera.com> wrote:
>>>>
>>>>> The Multiport syslog source can add the port number on which the data
>>>>> was received to the event headers. You can use with a multiplexing channel
>>>>> selector to separate this to different channels.
>>>>>
>>>>> Thanks,
>>>>> Hari
>>>>>
>>>>>
>>>>> On Wed, Oct 15, 2014 at 9:45 PM, Sharninder <sh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Guys,
>>>>>>
>>>>>> I'm trying to implement a system to archive syslogs using flume. I've
>>>>>> played around with it a bit but haven't really been able to figure out a
>>>>>> way to segregate logs according to the host they're coming from? Is there a
>>>>>> way for me to add the hostname to the event header somehow? I can then use
>>>>>> either an interceptor to read the header or even a custom sink to deal with
>>>>>> events based on the hostname.
>>>>>>
>>>>>> --
>>>>>> Sharninder
>>>>>>
>>>>>>
>>>>>
>>>
>>

Re: Flume Syslog source

Posted by Jean <la...@yahoo.fr>.
Hi,
Why the multiport sylog is better than the standard syslog source ?
I have many agents with syslog source (>5M events/day) and didn't notice any performance problem.
Jean
> Le 16 oct. 2014 à 17:22, Jeff Lord <jl...@cloudera.com> a écrit :
> 
> You will get better perf out of the multiport syslog source
> 
>> On Wednesday, October 15, 2014, Sharninder <sh...@gmail.com> wrote:
>> I just looked at the existing syslogtcp source and it seems it does take pains to parse the hostname from the message and I think that is the best bet for me. Ofcourse, it might fail for a few devices, but I'll just have to think of something else for those.
>> 
>> --
>> Sharninder
>> 
>> 
>>> On Thu, Oct 16, 2014 at 10:40 AM, Sharninder <sh...@gmail.com> wrote:
>>> Yes Jeff. That's a possiblity but I'm not sure (actually pretty sure) that there would be a some random device which will not send their logs in the proper format and my regex will break. This is the way I'll implement it if I can't find anything better.
>>> 
>>> Thanks,
>>> Sharninder
>>> 
>>>  
>>> 
>>>> On Thu, Oct 16, 2014 at 10:22 AM, Jeff Lord <jl...@cloudera.com> wrote:
>>>> You can also use a regex interceptor to extract hostname from the message (assuming it's there) and put that in an event header. From there you can route and create partitions with the header.
>>>> 
>>>> 
>>>>> On Wednesday, October 15, 2014, Hari Shreedharan <hs...@cloudera.com> wrote:
>>>>> The Multiport syslog source can add the port number on which the data was received to the event headers. You can use with a multiplexing channel selector to separate this to different channels.
>>>>> 
>>>>> Thanks,
>>>>> Hari
>>>>> 
>>>>> 
>>>>>> On Wed, Oct 15, 2014 at 9:45 PM, Sharninder <sh...@gmail.com> wrote:
>>>>>> Hi Guys,
>>>>>> 
>>>>>> I'm trying to implement a system to archive syslogs using flume. I've played around with it a bit but haven't really been able to figure out a way to segregate logs according to the host they're coming from? Is there a way for me to add the hostname to the event header somehow? I can then use either an interceptor to read the header or even a custom sink to deal with events based on the hostname.
>>>>>> 
>>>>>> --
>>>>>> Sharninder

Re: Flume Syslog source

Posted by Jeff Lord <jl...@cloudera.com>.
You will get better perf out of the multiport syslog source

On Wednesday, October 15, 2014, Sharninder <sh...@gmail.com> wrote:

> I just looked at the existing syslogtcp source and it seems it does take
> pains to parse the hostname from the message and I think that is the best
> bet for me. Ofcourse, it might fail for a few devices, but I'll just have
> to think of something else for those.
>
> --
> Sharninder
>
>
> On Thu, Oct 16, 2014 at 10:40 AM, Sharninder <sharninder@gmail.com
> <javascript:_e(%7B%7D,'cvml','sharninder@gmail.com');>> wrote:
>
>> Yes Jeff. That's a possiblity but I'm not sure (actually pretty sure)
>> that there would be a some random device which will not send their logs in
>> the proper format and my regex will break. This is the way I'll implement
>> it if I can't find anything better.
>>
>> Thanks,
>> Sharninder
>>
>>
>>
>> On Thu, Oct 16, 2014 at 10:22 AM, Jeff Lord <jlord@cloudera.com
>> <javascript:_e(%7B%7D,'cvml','jlord@cloudera.com');>> wrote:
>>
>>> You can also use a regex interceptor to extract hostname from the
>>> message (assuming it's there) and put that in an event header. From there
>>> you can route and create partitions with the header.
>>>
>>>
>>> On Wednesday, October 15, 2014, Hari Shreedharan <
>>> hshreedharan@cloudera.com
>>> <javascript:_e(%7B%7D,'cvml','hshreedharan@cloudera.com');>> wrote:
>>>
>>>> The Multiport syslog source can add the port number on which the data
>>>> was received to the event headers. You can use with a multiplexing channel
>>>> selector to separate this to different channels.
>>>>
>>>> Thanks,
>>>> Hari
>>>>
>>>>
>>>> On Wed, Oct 15, 2014 at 9:45 PM, Sharninder <sh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Guys,
>>>>>
>>>>> I'm trying to implement a system to archive syslogs using flume. I've
>>>>> played around with it a bit but haven't really been able to figure out a
>>>>> way to segregate logs according to the host they're coming from? Is there a
>>>>> way for me to add the hostname to the event header somehow? I can then use
>>>>> either an interceptor to read the header or even a custom sink to deal with
>>>>> events based on the hostname.
>>>>>
>>>>> --
>>>>> Sharninder
>>>>>
>>>>>
>>>>
>>
>

Re: Flume Syslog source

Posted by Sharninder <sh...@gmail.com>.
I just looked at the existing syslogtcp source and it seems it does take
pains to parse the hostname from the message and I think that is the best
bet for me. Ofcourse, it might fail for a few devices, but I'll just have
to think of something else for those.

--
Sharninder


On Thu, Oct 16, 2014 at 10:40 AM, Sharninder <sh...@gmail.com> wrote:

> Yes Jeff. That's a possiblity but I'm not sure (actually pretty sure) that
> there would be a some random device which will not send their logs in the
> proper format and my regex will break. This is the way I'll implement it if
> I can't find anything better.
>
> Thanks,
> Sharninder
>
>
>
> On Thu, Oct 16, 2014 at 10:22 AM, Jeff Lord <jl...@cloudera.com> wrote:
>
>> You can also use a regex interceptor to extract hostname from the message
>> (assuming it's there) and put that in an event header. From there you can
>> route and create partitions with the header.
>>
>>
>> On Wednesday, October 15, 2014, Hari Shreedharan <
>> hshreedharan@cloudera.com> wrote:
>>
>>> The Multiport syslog source can add the port number on which the data
>>> was received to the event headers. You can use with a multiplexing channel
>>> selector to separate this to different channels.
>>>
>>> Thanks,
>>> Hari
>>>
>>>
>>> On Wed, Oct 15, 2014 at 9:45 PM, Sharninder <sh...@gmail.com>
>>> wrote:
>>>
>>>> Hi Guys,
>>>>
>>>> I'm trying to implement a system to archive syslogs using flume. I've
>>>> played around with it a bit but haven't really been able to figure out a
>>>> way to segregate logs according to the host they're coming from? Is there a
>>>> way for me to add the hostname to the event header somehow? I can then use
>>>> either an interceptor to read the header or even a custom sink to deal with
>>>> events based on the hostname.
>>>>
>>>> --
>>>> Sharninder
>>>>
>>>>
>>>
>

Re: Flume Syslog source

Posted by Sharninder <sh...@gmail.com>.
Yes Jeff. That's a possiblity but I'm not sure (actually pretty sure) that
there would be a some random device which will not send their logs in the
proper format and my regex will break. This is the way I'll implement it if
I can't find anything better.

Thanks,
Sharninder


On Thu, Oct 16, 2014 at 10:22 AM, Jeff Lord <jl...@cloudera.com> wrote:

> You can also use a regex interceptor to extract hostname from the message
> (assuming it's there) and put that in an event header. From there you can
> route and create partitions with the header.
>
>
> On Wednesday, October 15, 2014, Hari Shreedharan <
> hshreedharan@cloudera.com> wrote:
>
>> The Multiport syslog source can add the port number on which the data was
>> received to the event headers. You can use with a multiplexing channel
>> selector to separate this to different channels.
>>
>> Thanks,
>> Hari
>>
>>
>> On Wed, Oct 15, 2014 at 9:45 PM, Sharninder <sh...@gmail.com> wrote:
>>
>>> Hi Guys,
>>>
>>> I'm trying to implement a system to archive syslogs using flume. I've
>>> played around with it a bit but haven't really been able to figure out a
>>> way to segregate logs according to the host they're coming from? Is there a
>>> way for me to add the hostname to the event header somehow? I can then use
>>> either an interceptor to read the header or even a custom sink to deal with
>>> events based on the hostname.
>>>
>>> --
>>> Sharninder
>>>
>>>
>>

Re: Flume Syslog source

Posted by Jeff Lord <jl...@cloudera.com>.
You can also use a regex interceptor to extract hostname from the message
(assuming it's there) and put that in an event header. From there you can
route and create partitions with the header.

On Wednesday, October 15, 2014, Hari Shreedharan <hs...@cloudera.com>
wrote:

> The Multiport syslog source can add the port number on which the data was
> received to the event headers. You can use with a multiplexing channel
> selector to separate this to different channels.
>
> Thanks,
> Hari
>
>
> On Wed, Oct 15, 2014 at 9:45 PM, Sharninder <sharninder@gmail.com
> <javascript:_e(%7B%7D,'cvml','sharninder@gmail.com');>> wrote:
>
>> Hi Guys,
>>
>> I'm trying to implement a system to archive syslogs using flume. I've
>> played around with it a bit but haven't really been able to figure out a
>> way to segregate logs according to the host they're coming from? Is there a
>> way for me to add the hostname to the event header somehow? I can then use
>> either an interceptor to read the header or even a custom sink to deal with
>> events based on the hostname.
>>
>> --
>> Sharninder
>>
>>
>

Re: Flume Syslog source

Posted by Hari Shreedharan <hs...@cloudera.com>.
The Multiport syslog source can add the port number on which the data was received to the event headers. You can use with a multiplexing channel selector to separate this to different channels.


Thanks,
Hari

On Wed, Oct 15, 2014 at 9:45 PM, Sharninder <sh...@gmail.com> wrote:

> Hi Guys,
> I'm trying to implement a system to archive syslogs using flume. I've
> played around with it a bit but haven't really been able to figure out a
> way to segregate logs according to the host they're coming from? Is there a
> way for me to add the hostname to the event header somehow? I can then use
> either an interceptor to read the header or even a custom sink to deal with
> events based on the hostname.
> --
> Sharninder