You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Filip Slunecko <fi...@gmail.com> on 2012/10/16 23:27:55 UTC

syslog source - sinks without datetime/hostname

Hi,

I am trying to use syslog source and sink it to hdfs or fileroller.
Everything is working, but "saved" logs are without timestamp and
hostname information.
Is it possible to force flume-ng to dump those information from syslog
header togather with body lines?

I am using flume-ng-agent-1.2.0+24.4-1.noarch from Cloudera repository.

Thanks,

Filip

Re: syslog source - sinks without datetime/hostname

Posted by Hari Shreedharan <hs...@cloudera.com>.
Hi Roshan,

I believe Filip is asking about the hostname of the machine that
generated the syslog event and timestamp at the time the syslog event
was generated, not when the Flume agent received it. Syslog RFC adds
this to the syslog headers. Flume's syslog sources read these syslog
headers and put this information into the flume event headers and puts
the syslog event's message/body part into the flume event body. But
since these are in the headers, the TEXT serializer (which is the
default) will not write it out to the output stream. To make sure this
gets written out, the serializer needs to write the headers out as
well.

If what is desired is the hostname of the flume agent and the
timestamp it was processed at that agent (I do *not* think this is
what Filip wants), a custom interceptor can insert this info, but a
serializer is still needed to make sure the headers are written out to
the stream. So either way a custom serializer is needed to write this
data out.


Thanks,
Hari


On Tue, Oct 16, 2012 at 3:48 PM, Roshan Naik <ro...@hortonworks.com> wrote:
> Hari,
>    wouldn't  a custom interceptor be more logical ?
> -roshan
>
>
> On Tue, Oct 16, 2012 at 3:36 PM, Hari Shreedharan
> <hs...@cloudera.com> wrote:
>>
>> See the code for the serializers here:
>>
>> https://git-wip-us.apache.org/repos/asf?p=flume.git;a=tree;f=flume-ng-core/src/main/java/org/apache/flume/serialization;h=fcc07339b3cf0f5b8d1a75e978ffc1edbab28bfe;hb=HEAD
>> You can use one of these as an example to write your own.
>>
>> The configuration documentation for HDFS Sink is here:
>> http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
>> You can specify a custom serializer by specifying its builder class in the
>> config param "serializer".
>>
>>
>> Thanks,
>> Hari
>>
>> --
>> Hari Shreedharan
>>
>> On Tuesday, October 16, 2012 at 3:25 PM, Filip Slunecko wrote:
>>
>> @Roshan: thanks for suggestion, I will look into it.
>>
>> @Hari: I tried to google it up and there is not much about it. I will
>> look into it tomorrow and let you know (It' s too late for serious
>> work in my time realm :) )
>>
>> Thanks both of you for your quick response and help!
>>
>> Filip
>>
>> On Wed, Oct 17, 2012 at 12:00 AM, Hari Shreedharan
>> <hs...@cloudera.com> wrote:
>>
>> Hi Filip,
>>
>> The reason for this is that the Text serializer will only serialized the
>> body of the event, and the syslog sources write the body of the syslog
>> event
>> into the body of the flume event. The hostname/timestamp/severity etc are
>> added into the Flume Event headers. You could simply write a serializer
>> which writes out this information in the same format as you expect and you
>> will be able to see the headers in the files. You could use the Avro
>> serializer to serialize it into avro too, which will make sure the headers
>> are also written out.
>>
>> Hope this helps.
>>
>>
>> Hari
>>
>> --
>> Hari Shreedharan
>>
>> On Tuesday, October 16, 2012 at 2:27 PM, Filip Slunecko wrote:
>>
>> Hi,
>>
>> I am trying to use syslog source and sink it to hdfs or fileroller.
>> Everything is working, but "saved" logs are without timestamp and
>> hostname information.
>> Is it possible to force flume-ng to dump those information from syslog
>> header togather with body lines?
>>
>> I am using flume-ng-agent-1.2.0+24.4-1.noarch from Cloudera repository.
>>
>> Thanks,
>>
>> Filip
>>
>>
>

Re: syslog source - sinks without datetime/hostname

Posted by Filip Slunecko <fi...@gmail.com>.
Hi,

Thanks all of you for your help!
Sadly I had to choose an another way how to transfer logs even I like
flume-ng a lot.
We don't have enough resources to support custom "plugins" in java
written program.

Filip

On Wed, Oct 17, 2012 at 1:24 AM, Bhaskar V. Karambelkar
<bh...@gmail.com> wrote:
> Roshan,
> The problem is not at the event end, it's at the serializer end. The
> BodyTextSerializer which is the default, does not serialize the
> headers, so no matter what headers you have or don't have, it won't
> affect the output using the default serializer.
> So you've 2 options, use the avro_client serializer and get an avro
> data file, or write your own serializer that writes plain text, but
> does serialize the headers along with the body, in may be JSON/XML
> encoding etc.
>
> On Tue, Oct 16, 2012 at 6:48 PM, Roshan Naik <ro...@hortonworks.com> wrote:
>> Hari,
>>    wouldn't  a custom interceptor be more logical ?
>> -roshan
>>
>>
>> On Tue, Oct 16, 2012 at 3:36 PM, Hari Shreedharan
>> <hs...@cloudera.com> wrote:
>>>
>>> See the code for the serializers here:
>>>
>>> https://git-wip-us.apache.org/repos/asf?p=flume.git;a=tree;f=flume-ng-core/src/main/java/org/apache/flume/serialization;h=fcc07339b3cf0f5b8d1a75e978ffc1edbab28bfe;hb=HEAD
>>> You can use one of these as an example to write your own.
>>>
>>> The configuration documentation for HDFS Sink is here:
>>> http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
>>> You can specify a custom serializer by specifying its builder class in the
>>> config param "serializer".
>>>
>>>
>>> Thanks,
>>> Hari
>>>
>>> --
>>> Hari Shreedharan
>>>
>>> On Tuesday, October 16, 2012 at 3:25 PM, Filip Slunecko wrote:
>>>
>>> @Roshan: thanks for suggestion, I will look into it.
>>>
>>> @Hari: I tried to google it up and there is not much about it. I will
>>> look into it tomorrow and let you know (It' s too late for serious
>>> work in my time realm :) )
>>>
>>> Thanks both of you for your quick response and help!
>>>
>>> Filip
>>>
>>> On Wed, Oct 17, 2012 at 12:00 AM, Hari Shreedharan
>>> <hs...@cloudera.com> wrote:
>>>
>>> Hi Filip,
>>>
>>> The reason for this is that the Text serializer will only serialized the
>>> body of the event, and the syslog sources write the body of the syslog
>>> event
>>> into the body of the flume event. The hostname/timestamp/severity etc are
>>> added into the Flume Event headers. You could simply write a serializer
>>> which writes out this information in the same format as you expect and you
>>> will be able to see the headers in the files. You could use the Avro
>>> serializer to serialize it into avro too, which will make sure the headers
>>> are also written out.
>>>
>>> Hope this helps.
>>>
>>>
>>> Hari
>>>
>>> --
>>> Hari Shreedharan
>>>
>>> On Tuesday, October 16, 2012 at 2:27 PM, Filip Slunecko wrote:
>>>
>>> Hi,
>>>
>>> I am trying to use syslog source and sink it to hdfs or fileroller.
>>> Everything is working, but "saved" logs are without timestamp and
>>> hostname information.
>>> Is it possible to force flume-ng to dump those information from syslog
>>> header togather with body lines?
>>>
>>> I am using flume-ng-agent-1.2.0+24.4-1.noarch from Cloudera repository.
>>>
>>> Thanks,
>>>
>>> Filip
>>>
>>>
>>

Re: syslog source - sinks without datetime/hostname

Posted by "Bhaskar V. Karambelkar" <bh...@gmail.com>.
Roshan,
The problem is not at the event end, it's at the serializer end. The
BodyTextSerializer which is the default, does not serialize the
headers, so no matter what headers you have or don't have, it won't
affect the output using the default serializer.
So you've 2 options, use the avro_client serializer and get an avro
data file, or write your own serializer that writes plain text, but
does serialize the headers along with the body, in may be JSON/XML
encoding etc.

On Tue, Oct 16, 2012 at 6:48 PM, Roshan Naik <ro...@hortonworks.com> wrote:
> Hari,
>    wouldn't  a custom interceptor be more logical ?
> -roshan
>
>
> On Tue, Oct 16, 2012 at 3:36 PM, Hari Shreedharan
> <hs...@cloudera.com> wrote:
>>
>> See the code for the serializers here:
>>
>> https://git-wip-us.apache.org/repos/asf?p=flume.git;a=tree;f=flume-ng-core/src/main/java/org/apache/flume/serialization;h=fcc07339b3cf0f5b8d1a75e978ffc1edbab28bfe;hb=HEAD
>> You can use one of these as an example to write your own.
>>
>> The configuration documentation for HDFS Sink is here:
>> http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
>> You can specify a custom serializer by specifying its builder class in the
>> config param "serializer".
>>
>>
>> Thanks,
>> Hari
>>
>> --
>> Hari Shreedharan
>>
>> On Tuesday, October 16, 2012 at 3:25 PM, Filip Slunecko wrote:
>>
>> @Roshan: thanks for suggestion, I will look into it.
>>
>> @Hari: I tried to google it up and there is not much about it. I will
>> look into it tomorrow and let you know (It' s too late for serious
>> work in my time realm :) )
>>
>> Thanks both of you for your quick response and help!
>>
>> Filip
>>
>> On Wed, Oct 17, 2012 at 12:00 AM, Hari Shreedharan
>> <hs...@cloudera.com> wrote:
>>
>> Hi Filip,
>>
>> The reason for this is that the Text serializer will only serialized the
>> body of the event, and the syslog sources write the body of the syslog
>> event
>> into the body of the flume event. The hostname/timestamp/severity etc are
>> added into the Flume Event headers. You could simply write a serializer
>> which writes out this information in the same format as you expect and you
>> will be able to see the headers in the files. You could use the Avro
>> serializer to serialize it into avro too, which will make sure the headers
>> are also written out.
>>
>> Hope this helps.
>>
>>
>> Hari
>>
>> --
>> Hari Shreedharan
>>
>> On Tuesday, October 16, 2012 at 2:27 PM, Filip Slunecko wrote:
>>
>> Hi,
>>
>> I am trying to use syslog source and sink it to hdfs or fileroller.
>> Everything is working, but "saved" logs are without timestamp and
>> hostname information.
>> Is it possible to force flume-ng to dump those information from syslog
>> header togather with body lines?
>>
>> I am using flume-ng-agent-1.2.0+24.4-1.noarch from Cloudera repository.
>>
>> Thanks,
>>
>> Filip
>>
>>
>

Re: syslog source - sinks without datetime/hostname

Posted by Roshan Naik <ro...@hortonworks.com>.
Hari,
   wouldn't  a custom interceptor be more logical ?
-roshan


On Tue, Oct 16, 2012 at 3:36 PM, Hari Shreedharan <hshreedharan@cloudera.com
> wrote:

>  See the code for the serializers here:
>
> https://git-wip-us.apache.org/repos/asf?p=flume.git;a=tree;f=flume-ng-core/src/main/java/org/apache/flume/serialization;h=fcc07339b3cf0f5b8d1a75e978ffc1edbab28bfe;hb=HEAD
> You can use one of these as an example to write your own.
>
> The configuration documentation for HDFS Sink is here:
> http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
> You can specify a custom serializer by specifying its builder class in the
> config param "serializer".
>
>
> Thanks,
> Hari
>
> --
> Hari Shreedharan
>
> On Tuesday, October 16, 2012 at 3:25 PM, Filip Slunecko wrote:
>
> @Roshan: thanks for suggestion, I will look into it.
>
> @Hari: I tried to google it up and there is not much about it. I will
> look into it tomorrow and let you know (It' s too late for serious
> work in my time realm :) )
>
> Thanks both of you for your quick response and help!
>
> Filip
>
> On Wed, Oct 17, 2012 at 12:00 AM, Hari Shreedharan
> <hs...@cloudera.com> wrote:
>
> Hi Filip,
>
> The reason for this is that the Text serializer will only serialized the
> body of the event, and the syslog sources write the body of the syslog
> event
> into the body of the flume event. The hostname/timestamp/severity etc are
> added into the Flume Event headers. You could simply write a serializer
> which writes out this information in the same format as you expect and you
> will be able to see the headers in the files. You could use the Avro
> serializer to serialize it into avro too, which will make sure the headers
> are also written out.
>
> Hope this helps.
>
>
> Hari
>
> --
> Hari Shreedharan
>
> On Tuesday, October 16, 2012 at 2:27 PM, Filip Slunecko wrote:
>
> Hi,
>
> I am trying to use syslog source and sink it to hdfs or fileroller.
> Everything is working, but "saved" logs are without timestamp and
> hostname information.
> Is it possible to force flume-ng to dump those information from syslog
> header togather with body lines?
>
> I am using flume-ng-agent-1.2.0+24.4-1.noarch from Cloudera repository.
>
> Thanks,
>
> Filip
>
>
>

Re: syslog source - sinks without datetime/hostname

Posted by Hari Shreedharan <hs...@cloudera.com>.
See the code for the serializers here: 
https://git-wip-us.apache.org/repos/asf?p=flume.git;a=tree;f=flume-ng-core/src/main/java/org/apache/flume/serialization;h=fcc07339b3cf0f5b8d1a75e978ffc1edbab28bfe;hb=HEAD
You can use one of these as an example to write your own.

The configuration documentation for HDFS Sink is here: http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
You can specify a custom serializer by specifying its builder class in the config param "serializer".


Thanks,
Hari

-- 
Hari Shreedharan


On Tuesday, October 16, 2012 at 3:25 PM, Filip Slunecko wrote:

> @Roshan: thanks for suggestion, I will look into it.
> 
> @Hari: I tried to google it up and there is not much about it. I will
> look into it tomorrow and let you know (It' s too late for serious
> work in my time realm :) )
> 
> Thanks both of you for your quick response and help!
> 
> Filip
> 
> On Wed, Oct 17, 2012 at 12:00 AM, Hari Shreedharan
> <hshreedharan@cloudera.com (mailto:hshreedharan@cloudera.com)> wrote:
> > Hi Filip,
> > 
> > The reason for this is that the Text serializer will only serialized the
> > body of the event, and the syslog sources write the body of the syslog event
> > into the body of the flume event. The hostname/timestamp/severity etc are
> > added into the Flume Event headers. You could simply write a serializer
> > which writes out this information in the same format as you expect and you
> > will be able to see the headers in the files. You could use the Avro
> > serializer to serialize it into avro too, which will make sure the headers
> > are also written out.
> > 
> > Hope this helps.
> > 
> > 
> > Hari
> > 
> > --
> > Hari Shreedharan
> > 
> > On Tuesday, October 16, 2012 at 2:27 PM, Filip Slunecko wrote:
> > 
> > Hi,
> > 
> > I am trying to use syslog source and sink it to hdfs or fileroller.
> > Everything is working, but "saved" logs are without timestamp and
> > hostname information.
> > Is it possible to force flume-ng to dump those information from syslog
> > header togather with body lines?
> > 
> > I am using flume-ng-agent-1.2.0+24.4-1.noarch from Cloudera repository.
> > 
> > Thanks,
> > 
> > Filip 


Re: syslog source - sinks without datetime/hostname

Posted by Filip Slunecko <fi...@gmail.com>.
@Roshan: thanks for suggestion, I will look into it.

@Hari: I tried to google it up and there is not much about it. I will
look into it tomorrow and let you know (It' s too late for serious
work in my time realm :) )

Thanks both of you for your quick response and help!

Filip

On Wed, Oct 17, 2012 at 12:00 AM, Hari Shreedharan
<hs...@cloudera.com> wrote:
> Hi Filip,
>
> The reason for this is that the Text serializer will only serialized the
> body of the event, and the syslog sources write the body of the syslog event
> into the body of the flume event. The hostname/timestamp/severity etc are
> added into the Flume Event headers. You could simply write a serializer
> which writes out this information in the same format as you expect and you
> will be able to see the headers in the files. You could use the Avro
> serializer to serialize it into avro too, which will make sure the headers
> are also written out.
>
> Hope this helps.
>
>
> Hari
>
> --
> Hari Shreedharan
>
> On Tuesday, October 16, 2012 at 2:27 PM, Filip Slunecko wrote:
>
> Hi,
>
> I am trying to use syslog source and sink it to hdfs or fileroller.
> Everything is working, but "saved" logs are without timestamp and
> hostname information.
> Is it possible to force flume-ng to dump those information from syslog
> header togather with body lines?
>
> I am using flume-ng-agent-1.2.0+24.4-1.noarch from Cloudera repository.
>
> Thanks,
>
> Filip
>
>

Re: syslog source - sinks without datetime/hostname

Posted by Hari Shreedharan <hs...@cloudera.com>.
Hi Filip, 

The reason for this is that the Text serializer will only serialized the body of the event, and the syslog sources write the body of the syslog event into the body of the flume event. The hostname/timestamp/severity etc are added into the Flume Event headers. You could simply write a serializer which writes out this information in the same format as you expect and you will be able to see the headers in the files. You could use the Avro serializer to serialize it into avro too, which will make sure the headers are also written out.

Hope this helps.


Hari 

-- 
Hari Shreedharan


On Tuesday, October 16, 2012 at 2:27 PM, Filip Slunecko wrote:

> Hi,
> 
> I am trying to use syslog source and sink it to hdfs or fileroller.
> Everything is working, but "saved" logs are without timestamp and
> hostname information.
> Is it possible to force flume-ng to dump those information from syslog
> header togather with body lines?
> 
> I am using flume-ng-agent-1.2.0+24.4-1.noarch from Cloudera repository.
> 
> Thanks,
> 
> Filip 


Re: syslog source - sinks without datetime/hostname

Posted by Roshan Naik <ro...@hortonworks.com>.
Would  Host/timestamp interceptors  work for you ?
http://flume.apache.org/FlumeUserGuide.html#timestamp-interceptor
-Roshan

On Tue, Oct 16, 2012 at 2:27 PM, Filip Slunecko <fi...@gmail.com>wrote:

> Hi,
>
> I am trying to use syslog source and sink it to hdfs or fileroller.
> Everything is working, but "saved" logs are without timestamp and
> hostname information.
> Is it possible to force flume-ng to dump those information from syslog
> header togather with body lines?
>
> I am using flume-ng-agent-1.2.0+24.4-1.noarch from Cloudera repository.
>
> Thanks,
>
> Filip
>