You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Brian Hart <bb...@bbhart.com> on 2012/07/03 05:54:38 UTC

Preserving syslog information

I'm working on a project where DNS & DHCP log data need to be aggregated from 180+ servers spread around the WAN down to one (maybe two) centralized servers.  From the central server(s), I'll need to scp them to another company periodically throughout the day.  It's not critical for each message to reach the central servers, but it'd be really nice if they did.  

I have some architecture questions, but my blocker right now is that my syslog messages are only coming across to the central server as "<sending user>: <log text>" (eg. "hart_b: This is test 1") and I'm losing the other syslog info like date, hostname, and facility.

I searching the mailing list and wiki, but I can't figure out how to do this in 1.1.0-incubating.  Syslog on my test DHCP server points to the IP for 'remote1', and you can see the rest in my conf file (below).  I think I'm supposed to use the syslog serializer, but I'm not clear on how to do that.

# CENTRAL NODE
central.channels.ch1.type = memory

central.sources.avro-source1.channels = ch1
central.sources.avro-source1.type = avro
central.sources.avro-source1.bind = 0.0.0.0
central.sources.avro-source1.port = 41414

central.sinks.fileroll_sink1.channel = ch1
central.sinks.fileroll_sink1.type = file_roll
central.sinks.fileroll_sink1.sink.directory = /opt/logs_from_flume/
central.sinks.fileroll_sink1.sink.rollInterval = 30

central.channels = ch1
central.sources = avro-source1
central.sinks = fileroll_sink1

# REMOTE NODE 1 - North America
remote1.channels.ch1.type = memory

remote1.sources.syslog-source1.channels = ch1
remote1.sources.syslog-source1.type = syslogudp
remote1.sources.syslog-source1.host = 0.0.0.0
remote1.sources.syslog-source1.port = 514

remote1.sinks.avro-sink1.channel = ch1
remote1.sinks.avro-sink1.type = avro
remote1.sinks.avro-sink1.hostname = 192.168.1.60
remote1.sinks.avro-sink1.port = 41414
remote1.sinks.avro-sink1.batch-size = 100

remote1.channels = ch1
remote1.sources = syslog-source1
remote1.sinks = avro-sink1

-=-=-
Apologies for asking what might be a basic question, but how can I preserve the syslog info so that it makes it into the rolling files on Central?

Thanks,
Brian



Re: Preserving syslog information

Posted by Hari Shreedharan <hs...@cloudera.com>.
Brent,

I have assigned FLUME-1277 to you. Please post the patch to review board as
mentioned on the bug, for faster review.

Thanks
Hari

On Tue, Jul 3, 2012 at 11:44 AM, Brent Halsey <mr...@gmail.com> wrote:

> It's possible that you've run into FLUME-1277 "Error parsing Syslog
> rfc 3164 messages with null values".  Basically, the date is skipped,
> and null values (hyphens) are interpreted as a null date.  Potential
> fixes are to use FLUME-1277's patch, make sure you don't have hyphens
> in your syslog message, or change the date format to rfc 5424 style.
>
> The flume syslog parser doesn't extract syslog tags (program name),
> either.  We've just started patching SyslogUtils.java to pull this
> out.
>
>  -brent
>
> On Mon, Jul 2, 2012 at 9:54 PM, Brian Hart <bb...@bbhart.com> wrote:
> >
> > I'm working on a project where DNS & DHCP log data need to be aggregated
> > from 180+ servers spread around the WAN down to one (maybe two)
> centralized
> > servers.  From the central server(s), I'll need to scp them to another
> > company periodically throughout the day.  It's not critical for each
> message
> > to reach the central servers, but it'd be really nice if they did.
> >
> > I have some architecture questions, but my blocker right now is that my
> > syslog messages are only coming across to the central server as "<sending
> > user>: <log text>" (eg. "hart_b: This is test 1") and I'm losing the
> other
> > syslog info like date, hostname, and facility.
> >
> > I searching the mailing list and wiki, but I can't figure out how to do
> this
> > in 1.1.0-incubating.  Syslog on my test DHCP server points to the IP for
> > 'remote1', and you can see the rest in my conf file (below).  I think I'm
> > supposed to use the syslog serializer, but I'm not clear on how to do
> that.
> >
> > # CENTRAL NODE
> > central.channels.ch1.type = memory
> >
> > central.sources.avro-source1.channels = ch1
> > central.sources.avro-source1.type = avro
> > central.sources.avro-source1.bind = 0.0.0.0
> > central.sources.avro-source1.port = 41414
> >
> > central.sinks.fileroll_sink1.channel = ch1
> > central.sinks.fileroll_sink1.type = file_roll
> > central.sinks.fileroll_sink1.sink.directory = /opt/logs_from_flume/
> > central.sinks.fileroll_sink1.sink.rollInterval = 30
> >
> > central.channels = ch1
> > central.sources = avro-source1
> > central.sinks = fileroll_sink1
> >
> > # REMOTE NODE 1 - North America
> > remote1.channels.ch1.type = memory
> >
> > remote1.sources.syslog-source1.channels = ch1
> > remote1.sources.syslog-source1.type = syslogudp
> > remote1.sources.syslog-source1.host = 0.0.0.0
> > remote1.sources.syslog-source1.port = 514
> >
> > remote1.sinks.avro-sink1.channel = ch1
> > remote1.sinks.avro-sink1.type = avro
> > remote1.sinks.avro-sink1.hostname = 192.168.1.60
> > remote1.sinks.avro-sink1.port = 41414
> > remote1.sinks.avro-sink1.batch-size = 100
> >
> > remote1.channels = ch1
> > remote1.sources = syslog-source1
> > remote1.sinks = avro-sink1
> >
> > -=-=-
> > Apologies for asking what might be a basic question, but how can I
> preserve
> > the syslog info so that it makes it into the rolling files on Central?
> >
> > Thanks,
> > Brian
> >
> >
>

RE: Preserving syslog information

Posted by Brian Hart <bb...@bbhart.com>.
Brent,
Thanks for the response. When I used logger to generate a 'This is a test.'
message (logger -p daemon.info This is a test), in my local syslog I see
"Jul  4 15:42:42 serverA hart_b: This is a test." but on my Central server
the entire message is "hart_b: This is a test.".  The date and host are
dropped for some reason.  This is the case on 1.1 and 1.3 (rev 1357365).
Recall this is a syslog source, avro sink on one server, to a avro source
and file_roll sink on the other.

If I send BIND messages through Flume 1.3, those do appear complete on both
sides, including the program name:  "named[1361]: 04-Jul-2012 15:41:39.083
queries: client 192.168.1.107#64184: query: www.cloudera.com IN AAAA +
(192.168.1.101)".  I think we're good there.

Now that I have some full messages flowing courtesy of BIND, I'm hitting
buffer size problems.  I'll start a separate thread for that.

Thanks again,
Brian

-----Original Message-----
From: Brent Halsey [mailto:mrbrent@gmail.com] 
Sent: Tuesday, July 03, 2012 1:44 PM
To: flume-user@incubator.apache.org
Subject: Re: Preserving syslog information

It's possible that you've run into FLUME-1277 "Error parsing Syslog rfc 3164
messages with null values".  Basically, the date is skipped, and null values
(hyphens) are interpreted as a null date.  Potential fixes are to use
FLUME-1277's patch, make sure you don't have hyphens in your syslog message,
or change the date format to rfc 5424 style.

The flume syslog parser doesn't extract syslog tags (program name), either.
We've just started patching SyslogUtils.java to pull this out.

 -brent

On Mon, Jul 2, 2012 at 9:54 PM, Brian Hart <bb...@bbhart.com> wrote:
>
> I'm working on a project where DNS & DHCP log data need to be 
> aggregated from 180+ servers spread around the WAN down to one (maybe 
> two) centralized servers.  From the central server(s), I'll need to 
> scp them to another company periodically throughout the day.  It's not 
> critical for each message to reach the central servers, but it'd be really
nice if they did.
>
> I have some architecture questions, but my blocker right now is that 
> my syslog messages are only coming across to the central server as 
> "<sending
> user>: <log text>" (eg. "hart_b: This is test 1") and I'm losing the 
> user>other
> syslog info like date, hostname, and facility.
>
> I searching the mailing list and wiki, but I can't figure out how to 
> do this in 1.1.0-incubating.  Syslog on my test DHCP server points to 
> the IP for 'remote1', and you can see the rest in my conf file 
> (below).  I think I'm supposed to use the syslog serializer, but I'm not
clear on how to do that.
>
> # CENTRAL NODE
> central.channels.ch1.type = memory
>
> central.sources.avro-source1.channels = ch1 
> central.sources.avro-source1.type = avro 
> central.sources.avro-source1.bind = 0.0.0.0 
> central.sources.avro-source1.port = 41414
>
> central.sinks.fileroll_sink1.channel = ch1 
> central.sinks.fileroll_sink1.type = file_roll 
> central.sinks.fileroll_sink1.sink.directory = /opt/logs_from_flume/ 
> central.sinks.fileroll_sink1.sink.rollInterval = 30
>
> central.channels = ch1
> central.sources = avro-source1
> central.sinks = fileroll_sink1
>
> # REMOTE NODE 1 - North America
> remote1.channels.ch1.type = memory
>
> remote1.sources.syslog-source1.channels = ch1 
> remote1.sources.syslog-source1.type = syslogudp 
> remote1.sources.syslog-source1.host = 0.0.0.0 
> remote1.sources.syslog-source1.port = 514
>
> remote1.sinks.avro-sink1.channel = ch1 remote1.sinks.avro-sink1.type = 
> avro remote1.sinks.avro-sink1.hostname = 192.168.1.60 
> remote1.sinks.avro-sink1.port = 41414 
> remote1.sinks.avro-sink1.batch-size = 100
>
> remote1.channels = ch1
> remote1.sources = syslog-source1
> remote1.sinks = avro-sink1
>
> -=-=-
> Apologies for asking what might be a basic question, but how can I 
> preserve the syslog info so that it makes it into the rolling files on
Central?
>
> Thanks,
> Brian
>
>


Re: Preserving syslog information

Posted by Brent Halsey <mr...@gmail.com>.
It's possible that you've run into FLUME-1277 "Error parsing Syslog
rfc 3164 messages with null values".  Basically, the date is skipped,
and null values (hyphens) are interpreted as a null date.  Potential
fixes are to use FLUME-1277's patch, make sure you don't have hyphens
in your syslog message, or change the date format to rfc 5424 style.

The flume syslog parser doesn't extract syslog tags (program name),
either.  We've just started patching SyslogUtils.java to pull this
out.

 -brent

On Mon, Jul 2, 2012 at 9:54 PM, Brian Hart <bb...@bbhart.com> wrote:
>
> I'm working on a project where DNS & DHCP log data need to be aggregated
> from 180+ servers spread around the WAN down to one (maybe two) centralized
> servers.  From the central server(s), I'll need to scp them to another
> company periodically throughout the day.  It's not critical for each message
> to reach the central servers, but it'd be really nice if they did.
>
> I have some architecture questions, but my blocker right now is that my
> syslog messages are only coming across to the central server as "<sending
> user>: <log text>" (eg. "hart_b: This is test 1") and I'm losing the other
> syslog info like date, hostname, and facility.
>
> I searching the mailing list and wiki, but I can't figure out how to do this
> in 1.1.0-incubating.  Syslog on my test DHCP server points to the IP for
> 'remote1', and you can see the rest in my conf file (below).  I think I'm
> supposed to use the syslog serializer, but I'm not clear on how to do that.
>
> # CENTRAL NODE
> central.channels.ch1.type = memory
>
> central.sources.avro-source1.channels = ch1
> central.sources.avro-source1.type = avro
> central.sources.avro-source1.bind = 0.0.0.0
> central.sources.avro-source1.port = 41414
>
> central.sinks.fileroll_sink1.channel = ch1
> central.sinks.fileroll_sink1.type = file_roll
> central.sinks.fileroll_sink1.sink.directory = /opt/logs_from_flume/
> central.sinks.fileroll_sink1.sink.rollInterval = 30
>
> central.channels = ch1
> central.sources = avro-source1
> central.sinks = fileroll_sink1
>
> # REMOTE NODE 1 - North America
> remote1.channels.ch1.type = memory
>
> remote1.sources.syslog-source1.channels = ch1
> remote1.sources.syslog-source1.type = syslogudp
> remote1.sources.syslog-source1.host = 0.0.0.0
> remote1.sources.syslog-source1.port = 514
>
> remote1.sinks.avro-sink1.channel = ch1
> remote1.sinks.avro-sink1.type = avro
> remote1.sinks.avro-sink1.hostname = 192.168.1.60
> remote1.sinks.avro-sink1.port = 41414
> remote1.sinks.avro-sink1.batch-size = 100
>
> remote1.channels = ch1
> remote1.sources = syslog-source1
> remote1.sinks = avro-sink1
>
> -=-=-
> Apologies for asking what might be a basic question, but how can I preserve
> the syslog info so that it makes it into the rolling files on Central?
>
> Thanks,
> Brian
>
>