You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Louis-Étienne Dorval <le...@gmail.com> on 2016/01/08 17:36:01 UTC

Merge ListenSyslog events

Hi everyone!

I'm looking to use the new ListenSyslog processor in a proof-of-concept
[project but I encounter a problem that I can find a suitable solution
(yet!).
I'm receiving logs from multiple Java-based server using a logback/log4j
SyslogAppender. The messages are received successfully but when a stack
trace happens, each lines are broken into single FlowFile.

I'm trying to achieve something like the following:
http://docs.splunk.com/Documentation/Splunk/6.2.2/Data/Indexmulti-lineevents

I tried:
- Increasing the "Max Batch Size", but I end up merging lines that should
not be merge and there's no way to know then length of the stack trace...
- Use MergeContent using the host as "Correlation Attribute Name", but as
before I merge lines that should not be merge
- Use MergeContent followed by SplitContent, that might work but the
SplitContent is pretty restrictive and I can't find a "Byte Sequence" that
are different from stack trace.

Even if I find a magic "Byte Sequence" for my last try (MergeContent +
SplitContent), I would most probably lose a part of the stacktrace as the
MergeContent is limited by the "Max Batch Size"


The only solution that I see is to modify the ListenSyslog to add some
similar parameter as the Splunk documentation explains and use that rather
than a fixed "Max Batch Size".

Am I missing a another option?
Would that be a suitable feature? (maybe I should ask that question in the
dev mailing list)

Best regards!

Re: Merge ListenSyslog events

Posted by Bryan Bende <bb...@gmail.com>.
Makes sense about not wanting change the logging configurations.

Thanks for taking the time to capture that issue in JIRA. I would say you
are already following the process by discussing the change with the
community and putting in a very descriptive JIRA :)

On a side note, I've been working on NIFI-1273 for the past week [1] and as
part of the ticket I've refactored some of the internals of ListenSyslog
and moved a lot of the inner classes to their own regular classes. While
doing this I was also considering your point about pattern matching for the
end of messages, and I tried to create an extension point that would let us
support different message delimiters in the future. It may not be perfect,
but I think it will make it slightly easier to make some of the changes you
are looking for.

It will probably take a few more days before I can get a pull request
submitted, but just wanted to point this out so we can coordinate.

Thanks,

Bryan

[1] https://issues.apache.org/jira/browse/NIFI-1273

On Thu, Jan 14, 2016 at 12:27 PM, Louis-Étienne Dorval <le...@gmail.com>
wrote:

> Hi,
>
> Thanks for the reply Bryan.
>
> I'd rather not update the logback/log4j because the service is already in
> place and for now I just try to fit around the current system. Anyway
> according to the RFC, a syslog message must not be longer than 1024 bytes
> so a single "event" might be splitted anyway.
>
> I've create NIFI-1392 for that feature. I'm not sure of the process for a
> feature request but I'll try to find some times to create a pull request or
> a patch for this.
>
>
> Best regards,
> Louis-Etienne
>
> On 8 January 2016 at 12:15, Bryan Bende <bb...@gmail.com> wrote:
>
>> Hello,
>>
>> Glad to hear you are getting started using ListenSyslog!
>>
>> You are definitely running into something that we should consider
>> supporting. The current implementation treats each new-line as the message
>> delimiter and places each message on to a queue.
>>
>> When the processor is triggered, it grabs messages from the queue up to
>> the "Max Batch Size". So in the default case it grabs a single message from
>> the queue, which in your case is a single line
>> from one of the mult-line messages, and produces a FlowFile. When "Max
>> Batch Size" is set higher to say 100, it grabs up to 100 messages and
>> produces a FlowFile containing all 100 messages.
>>
>> The messages in the queue are simultaneously coming from all of the
>> incoming connections, so this is why you don't see all the lines from one
>> server in the same order. Imagine the queue having something like:
>>
>> java-server-1 message1 line1
>> java-server-2 message1 line1
>> java-server-1 message1 line2
>> java-server-3 message1 line1
>> java-server-2 message1 line2
>> ....
>>
>> I would need to dig into that splunk documentation a little more, but I
>> think you are right that we could possibly expose some kind of message
>> delimiter pattern on the processor which
>> would be applied when reading the messages, before they even make into
>> the queue, so that by the time it gets put in the queue it would be all of
>> the lines from one message.
>>
>> Given the current situation, there might be one other option for you. Are
>> you able to control/change the logback/log4j configuration for the servers
>> sending the logs?
>>
>> If so, a JSON layout might solve the problem. These configuration files
>> show how to do that:
>>
>> https://github.com/bbende/jsonevent-producer/tree/master/src/main/resources
>>
>> I know this worked well with the ListenUDP processor to ensure that an
>> entire stack trace was sent as a single JSON document, but I have not had a
>> chance to try it with ListenSyslog and the SyslogAppender.
>> If you are using ListenSyslog with TCP, then it will probably come down
>> to whether logback/log4j puts new-lines inside the JSON document, or only a
>> single new-line at the end.
>>
>> -Bryan
>>
>>
>> On Fri, Jan 8, 2016 at 11:36 AM, Louis-Étienne Dorval <ledor473@gmail.com
>> > wrote:
>>
>>> Hi everyone!
>>>
>>> I'm looking to use the new ListenSyslog processor in a proof-of-concept
>>> [project but I encounter a problem that I can find a suitable solution
>>> (yet!).
>>> I'm receiving logs from multiple Java-based server using a logback/log4j
>>> SyslogAppender. The messages are received successfully but when a stack
>>> trace happens, each lines are broken into single FlowFile.
>>>
>>> I'm trying to achieve something like the following:
>>> http://docs.splunk.com/Documentation/Splunk/6.2.2/Data/Indexmulti-lineevents
>>>
>>> I tried:
>>> - Increasing the "Max Batch Size", but I end up merging lines that
>>> should not be merge and there's no way to know then length of the stack
>>> trace...
>>> - Use MergeContent using the host as "Correlation Attribute Name", but
>>> as before I merge lines that should not be merge
>>> - Use MergeContent followed by SplitContent, that might work but the
>>> SplitContent is pretty restrictive and I can't find a "Byte Sequence" that
>>> are different from stack trace.
>>>
>>> Even if I find a magic "Byte Sequence" for my last try (MergeContent +
>>> SplitContent), I would most probably lose a part of the stacktrace as the
>>> MergeContent is limited by the "Max Batch Size"
>>>
>>>
>>> The only solution that I see is to modify the ListenSyslog to add some
>>> similar parameter as the Splunk documentation explains and use that rather
>>> than a fixed "Max Batch Size".
>>>
>>> Am I missing a another option?
>>> Would that be a suitable feature? (maybe I should ask that question in
>>> the dev mailing list)
>>>
>>> Best regards!
>>>
>>
>>
>

Re: Merge ListenSyslog events

Posted by Louis-Étienne Dorval <le...@gmail.com>.
Hi,

Thanks for the reply Bryan.

I'd rather not update the logback/log4j because the service is already in
place and for now I just try to fit around the current system. Anyway
according to the RFC, a syslog message must not be longer than 1024 bytes
so a single "event" might be splitted anyway.

I've create NIFI-1392 for that feature. I'm not sure of the process for a
feature request but I'll try to find some times to create a pull request or
a patch for this.


Best regards,
Louis-Etienne

On 8 January 2016 at 12:15, Bryan Bende <bb...@gmail.com> wrote:

> Hello,
>
> Glad to hear you are getting started using ListenSyslog!
>
> You are definitely running into something that we should consider
> supporting. The current implementation treats each new-line as the message
> delimiter and places each message on to a queue.
>
> When the processor is triggered, it grabs messages from the queue up to
> the "Max Batch Size". So in the default case it grabs a single message from
> the queue, which in your case is a single line
> from one of the mult-line messages, and produces a FlowFile. When "Max
> Batch Size" is set higher to say 100, it grabs up to 100 messages and
> produces a FlowFile containing all 100 messages.
>
> The messages in the queue are simultaneously coming from all of the
> incoming connections, so this is why you don't see all the lines from one
> server in the same order. Imagine the queue having something like:
>
> java-server-1 message1 line1
> java-server-2 message1 line1
> java-server-1 message1 line2
> java-server-3 message1 line1
> java-server-2 message1 line2
> ....
>
> I would need to dig into that splunk documentation a little more, but I
> think you are right that we could possibly expose some kind of message
> delimiter pattern on the processor which
> would be applied when reading the messages, before they even make into the
> queue, so that by the time it gets put in the queue it would be all of the
> lines from one message.
>
> Given the current situation, there might be one other option for you. Are
> you able to control/change the logback/log4j configuration for the servers
> sending the logs?
>
> If so, a JSON layout might solve the problem. These configuration files
> show how to do that:
> https://github.com/bbende/jsonevent-producer/tree/master/src/main/resources
>
> I know this worked well with the ListenUDP processor to ensure that an
> entire stack trace was sent as a single JSON document, but I have not had a
> chance to try it with ListenSyslog and the SyslogAppender.
> If you are using ListenSyslog with TCP, then it will probably come down to
> whether logback/log4j puts new-lines inside the JSON document, or only a
> single new-line at the end.
>
> -Bryan
>
>
> On Fri, Jan 8, 2016 at 11:36 AM, Louis-Étienne Dorval <le...@gmail.com>
> wrote:
>
>> Hi everyone!
>>
>> I'm looking to use the new ListenSyslog processor in a proof-of-concept
>> [project but I encounter a problem that I can find a suitable solution
>> (yet!).
>> I'm receiving logs from multiple Java-based server using a logback/log4j
>> SyslogAppender. The messages are received successfully but when a stack
>> trace happens, each lines are broken into single FlowFile.
>>
>> I'm trying to achieve something like the following:
>> http://docs.splunk.com/Documentation/Splunk/6.2.2/Data/Indexmulti-lineevents
>>
>> I tried:
>> - Increasing the "Max Batch Size", but I end up merging lines that should
>> not be merge and there's no way to know then length of the stack trace...
>> - Use MergeContent using the host as "Correlation Attribute Name", but as
>> before I merge lines that should not be merge
>> - Use MergeContent followed by SplitContent, that might work but the
>> SplitContent is pretty restrictive and I can't find a "Byte Sequence" that
>> are different from stack trace.
>>
>> Even if I find a magic "Byte Sequence" for my last try (MergeContent +
>> SplitContent), I would most probably lose a part of the stacktrace as the
>> MergeContent is limited by the "Max Batch Size"
>>
>>
>> The only solution that I see is to modify the ListenSyslog to add some
>> similar parameter as the Splunk documentation explains and use that rather
>> than a fixed "Max Batch Size".
>>
>> Am I missing a another option?
>> Would that be a suitable feature? (maybe I should ask that question in
>> the dev mailing list)
>>
>> Best regards!
>>
>
>

Re: Merge ListenSyslog events

Posted by Bryan Bende <bb...@gmail.com>.
Hello,

Glad to hear you are getting started using ListenSyslog!

You are definitely running into something that we should consider
supporting. The current implementation treats each new-line as the message
delimiter and places each message on to a queue.

When the processor is triggered, it grabs messages from the queue up to the
"Max Batch Size". So in the default case it grabs a single message from the
queue, which in your case is a single line
from one of the mult-line messages, and produces a FlowFile. When "Max
Batch Size" is set higher to say 100, it grabs up to 100 messages and
produces a FlowFile containing all 100 messages.

The messages in the queue are simultaneously coming from all of the
incoming connections, so this is why you don't see all the lines from one
server in the same order. Imagine the queue having something like:

java-server-1 message1 line1
java-server-2 message1 line1
java-server-1 message1 line2
java-server-3 message1 line1
java-server-2 message1 line2
....

I would need to dig into that splunk documentation a little more, but I
think you are right that we could possibly expose some kind of message
delimiter pattern on the processor which
would be applied when reading the messages, before they even make into the
queue, so that by the time it gets put in the queue it would be all of the
lines from one message.

Given the current situation, there might be one other option for you. Are
you able to control/change the logback/log4j configuration for the servers
sending the logs?

If so, a JSON layout might solve the problem. These configuration files
show how to do that:
https://github.com/bbende/jsonevent-producer/tree/master/src/main/resources

I know this worked well with the ListenUDP processor to ensure that an
entire stack trace was sent as a single JSON document, but I have not had a
chance to try it with ListenSyslog and the SyslogAppender.
If you are using ListenSyslog with TCP, then it will probably come down to
whether logback/log4j puts new-lines inside the JSON document, or only a
single new-line at the end.

-Bryan


On Fri, Jan 8, 2016 at 11:36 AM, Louis-Étienne Dorval <le...@gmail.com>
wrote:

> Hi everyone!
>
> I'm looking to use the new ListenSyslog processor in a proof-of-concept
> [project but I encounter a problem that I can find a suitable solution
> (yet!).
> I'm receiving logs from multiple Java-based server using a logback/log4j
> SyslogAppender. The messages are received successfully but when a stack
> trace happens, each lines are broken into single FlowFile.
>
> I'm trying to achieve something like the following:
> http://docs.splunk.com/Documentation/Splunk/6.2.2/Data/Indexmulti-lineevents
>
> I tried:
> - Increasing the "Max Batch Size", but I end up merging lines that should
> not be merge and there's no way to know then length of the stack trace...
> - Use MergeContent using the host as "Correlation Attribute Name", but as
> before I merge lines that should not be merge
> - Use MergeContent followed by SplitContent, that might work but the
> SplitContent is pretty restrictive and I can't find a "Byte Sequence" that
> are different from stack trace.
>
> Even if I find a magic "Byte Sequence" for my last try (MergeContent +
> SplitContent), I would most probably lose a part of the stacktrace as the
> MergeContent is limited by the "Max Batch Size"
>
>
> The only solution that I see is to modify the ListenSyslog to add some
> similar parameter as the Splunk documentation explains and use that rather
> than a fixed "Max Batch Size".
>
> Am I missing a another option?
> Would that be a suitable feature? (maybe I should ask that question in the
> dev mailing list)
>
> Best regards!
>