You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by "Cochran, David" <da...@bsee.gov> on 2013/04/05 15:52:42 UTC

Dupes

I'm seeing a LOT of random dupes in some of my log files....

This is pretty consistent in one in particular that's being tail'ed
averages ~20M per day, everyday.  On the only sink (FILE_ROLL) the
resulting 24hour log is 55M.  Just some quick counts grep'ing a random time
(ie 07:23) shows the sink log with a dozen or so more lines with the same
timestamp than the source has every minute.

But this is happening like clockwork everyday for the last couple months
when I started using Flume on this box.

I did check that there wasn't another source from this or another server
sending to the same port...and the entries of the log file look proper for
that app.

The logs are not rolling at the same time on the source/sink and I've not
yet taken the time to set up copies of each begining and ending at the same
times and run a diff against them, but a preliminary 'eyeball diff' just
shows dupes.  I will note on the source a line with the exact same text may
appear more than once as the logging mechanism does not log more precise
then hour/minute.

All in all, dupes are better than drops, but is there anything in
particular I should look for to try to find the cause of and eliminate this?


Thanks in advance for any thoughts,
Dave

Re: Dupes

Posted by "Cochran, David" <da...@bsee.gov>.
Hi Israel,

I copied out the portions of my config that pertain to the server that I'm
seeing this bad behavior from (and sanitized it a little)  Otherwise my
config is like 1400 lines now, trying to stay with a single config and dist
it out to each server for consistency to save headaches.

Dave



# define sources, channels, and sinks for each log file
node_usvsm01.sources = source21061 source21062 source21063
node_usvsm01.channels = channel21061 channel21062 channel21063
node_usvsm01.sinks = sink21061 sink21062 sink21063

# source file usvsm01 - smaccess.log
node_usvsm01.sources.source21061.type = exec
node_usvsm01.sources.source21061.command = tail -F
/opt/siteminder/CA/log/smaccess.log
node_usvsm01.sources.source21061.channels = channel21061
# source file imsnolusvsm01 - smps.log
node_usvsm01.sources.source21062.type = exec
node_usvsm01.sources.source21062.command = tail -F
/opt/siteminder/CA/log/smps.log
node_usvsm01.sources.source21062.channels = channel21062
# source file imsnolusvsm01 - smtracedefault.log
node_usvsm01.sources.source21063.type = exec
node_usvsm01.sources.source21063.command = tail -F
/opt/siteminder/CA/log/smtracedefault.log
node_usvsm01.sources.source21063.channels = channel21063

node_usvsm01.channels.channel21061.type = memory
node_usvsm01.channels.channel21061.capacity = 100000
node_usvsm01.channels.channel21061.transactionCapactiy = 1000
node_usvsm01.channels.channel21062.type = memory
node_usvsm01.channels.channel21062.capacity = 100000
node_usvsm01.channels.channel21062.transactionCapactiy = 1000
node_usvsm01.channels.channel21063.type = memory
node_usvsm01.channels.channel21063.capacity = 100000
node_usvsm01.channels.channel21063.transactionCapactiy = 1000

# send channels --> flume @ usvinf01
node_usvsm01.sinks.sink21061.type = avro
node_usvsm01.sinks.sink21061.channel = channel21061
node_usvsm01.sinks.sink21061.hostname = usinf01
node_usvsm01.sinks.sink21061.port = 21061
node_usvsm01.sinks.sink21062.type = avro
node_usvsm01.sinks.sink21062.channel = channel21062
node_usvsm01.sinks.sink21062.hostname = usinf01
node_usvsm01.sinks.sink21062.port = 21062
node_usvsm01.sinks.sink21063.type = avro
node_usvsm01.sinks.sink21063.channel = channel21063
node_usvsm01.sinks.sink21063.hostname = usinf01
node_usvsm01.sinks.sink21063.port = 21063


node102.sources = source21061 source21062 source21063
node102.channels = channel21061 channel21062 channel21063
node102.sinks = sink21061 sink21062 sink21063

#  - usvsm01 -
# source file usvsm01 - smaccess.log
node102.sources.source21061.type = avro
node102.sources.source21061.bind = 0.0.0.0
node102.sources.source21061.port = 21061
node102.sources.source21061.channels = channel21061
# source file usvsm01 - smps.log
node102.sources.source21062.type = avro
node102.sources.source21062.bind = 0.0.0.0
node102.sources.source21062.port = 21062
node102.sources.source21062.channels = channel21062
# source file usvsm01 - smtracedefault.log
node102.sources.source21063.type = avro
node102.sources.source21063.bind = 0.0.0.0
node102.sources.source21063.port = 21063
node102.sources.source21063.channels = channel21063

#  - usvsm01 -
node102.channels.channel21061.type = memory
node102.channels.channel21061.capacity = 100000
node102.channels.channel21061.transactionCapactiy = 1000
node102.channels.channel21062.type = memory
node102.channels.channel21062.capacity = 100000
node102.channels.channel21062.transactionCapactiy = 1000
node102.channels.channel21063.type = memory
node102.channels.channel21063.capacity = 100000
node102.channels.channel21063.transactionCapactiy = 1000

# usvsm01 -
# source file usvsm01 - smaccess.log
node102.sinks.sink21061.type = FILE_ROLL
node102.sinks.sink21061.channel = channel21061
node102.sinks.sink21061.sink.directory =
/flume_logs/usvsm01/siteminder/smaccess_log
node102.sinks.sink21061.sink.rollInterval = 86400
node102.sinks.sink21061.sink.serializer = TEXT
# source file usvsm01 - smps.log
node102.sinks.sink21062.type = FILE_ROLL
node102.sinks.sink21062.channel = channel21062
node102.sinks.sink21062.sink.directory =
/flume_logs/usvsm01/siteminder/smps_log
node102.sinks.sink21062.sink.rollInterval = 86400
node102.sinks.sink21062.sink.serializer = TEXT
# source file usvsm01 - smtracedefault.log
node102.sinks.sink21063.type = FILE_ROLL
node102.sinks.sink21063.channel = channel21063
node102.sinks.sink21063.sink.directory =
/flume_logs/usvsm01/siteminder/smtracedefault_log
node102.sinks.sink21063.sink.rollInterval = 86400
node102.sinks.sink21063.sink.serializer = TEXT





On Fri, Apr 5, 2013 at 9:00 AM, Israel Ekpo <is...@aicer.org> wrote:

> Hi Dave,
>
> Could you post your agents configuration file?
>
> Sometimes, little mis-configurations can result in un-intended or
> undefined behaviors.
>
>
>
> On Fri, Apr 5, 2013 at 9:52 AM, Cochran, David <da...@bsee.gov>wrote:
>
>> I'm seeing a LOT of random dupes in some of my log files....
>>
>> This is pretty consistent in one in particular that's being tail'ed
>> averages ~20M per day, everyday.  On the only sink (FILE_ROLL) the
>> resulting 24hour log is 55M.  Just some quick counts grep'ing a random time
>> (ie 07:23) shows the sink log with a dozen or so more lines with the same
>> timestamp than the source has every minute.
>>
>> But this is happening like clockwork everyday for the last couple months
>> when I started using Flume on this box.
>>
>> I did check that there wasn't another source from this or another server
>> sending to the same port...and the entries of the log file look proper for
>> that app.
>>
>> The logs are not rolling at the same time on the source/sink and I've not
>> yet taken the time to set up copies of each begining and ending at the same
>> times and run a diff against them, but a preliminary 'eyeball diff' just
>> shows dupes.  I will note on the source a line with the exact same text may
>> appear more than once as the logging mechanism does not log more precise
>> then hour/minute.
>>
>> All in all, dupes are better than drops, but is there anything in
>> particular I should look for to try to find the cause of and eliminate this?
>>
>>
>> Thanks in advance for any thoughts,
>> Dave
>>
>>
>>
>>

Re: Dupes

Posted by Israel Ekpo <is...@aicer.org>.
Hi Dave,

Could you post your agents configuration file?

Sometimes, little mis-configurations can result in un-intended or undefined
behaviors.



On Fri, Apr 5, 2013 at 9:52 AM, Cochran, David <da...@bsee.gov>wrote:

> I'm seeing a LOT of random dupes in some of my log files....
>
> This is pretty consistent in one in particular that's being tail'ed
> averages ~20M per day, everyday.  On the only sink (FILE_ROLL) the
> resulting 24hour log is 55M.  Just some quick counts grep'ing a random time
> (ie 07:23) shows the sink log with a dozen or so more lines with the same
> timestamp than the source has every minute.
>
> But this is happening like clockwork everyday for the last couple months
> when I started using Flume on this box.
>
> I did check that there wasn't another source from this or another server
> sending to the same port...and the entries of the log file look proper for
> that app.
>
> The logs are not rolling at the same time on the source/sink and I've not
> yet taken the time to set up copies of each begining and ending at the same
> times and run a diff against them, but a preliminary 'eyeball diff' just
> shows dupes.  I will note on the source a line with the exact same text may
> appear more than once as the logging mechanism does not log more precise
> then hour/minute.
>
> All in all, dupes are better than drops, but is there anything in
> particular I should look for to try to find the cause of and eliminate this?
>
>
> Thanks in advance for any thoughts,
> Dave
>
>
>
>