You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@metron.apache.org by Stefan Kupstaitis-Dunkler <st...@gmail.com> on 2018/12/03 18:32:33 UTC

Raw Message Strategy "Envelope"

Hi,

just out of interest: what is/should be the expected behaviour of the raw
message strategy "ENVELOPE"?


   - Should a parser with this strategy only accept message that were
   already pre-processed by another parser?
   - Or should parser like this accept both? Direct ingests as well as
   ingests that are chained from a previous parser?


Imagine you have 2 different log sources. One adds a syslog header the
other doesn't.

Example message from source 1: "<86>Dec 3 18:25:10 my.hostname.com This is
the message"
Example message from source 2: "This is the other message".

Assumption is, that both "This is the message" and "This is the other
message" can be parsed using the same pattern.

Would I/Should I need to use 3 Kafka topics  (1 for the syslog parser, 1
for the chained parser and another identical for the direct ingestion) or 2
Kafka topics (1 for the syslog parser, 1 for both, the enveloped/chained
source and the "default" source).

Appreciate your thoughts and comments.

Best,
Stefan
-- 
Stefan Kupstaitis-Dunkler
https://datahovel.com/
https://www.meetup.com/Hadoop-User-Group-Vienna/
https://twitter.com/StefanDunkler

Re: Raw Message Strategy "Envelope"

Posted by Simon Elliston Ball <si...@simonellistonball.com>.
The envelope strategy controls how the parser views it's incoming data. In
other words, if the incoming data is json, should it treat one field as is
it were the original message, or treat the whole json as original.

To your example, the syslog parser would produce JSON, probably with a
bunch of syslog header fields and a field called something like "message".
(Note that these syslog wrappers may be useful or significant so you
probably don't want to just throw them away. This means:

A: (syslog wrapped) -> JSON (with message fields) -> Parser (P) for /This
is the .*message/
B: (raw) -> Parser (P) for /This is the .*message/

So you will need a syslog kafka topic, a syslog_parsed (json) kafka topic,
and a raw_message topic, along with two copies of P, one with envelope, one
without.

A better answer would be
A: (syslog wrapped) -> JSON (with message fields) -
B: (raw) ->  NoOp parser (e.g. Grok GREEDYDATA:message) which wraps into
metron JSON (with message fields)

now both those outputs can go into the same logical input topic for the
common envelope strategy parser:
Parser (P) for /This is the .*message/

It may seem counter intuitive to wrap the raw with an extra parser, but
this means you will end up with one Parser P set to anchor things like
enrichment and indexing config off further down the line, instead of two.

Simon




On Mon, 3 Dec 2018 at 18:32, Stefan Kupstaitis-Dunkler <st...@gmail.com>
wrote:

> Hi,
>
> just out of interest: what is/should be the expected behaviour of the raw
> message strategy "ENVELOPE"?
>
>
>    - Should a parser with this strategy only accept message that were
>    already pre-processed by another parser?
>    - Or should parser like this accept both? Direct ingests as well as
>    ingests that are chained from a previous parser?
>
>
> Imagine you have 2 different log sources. One adds a syslog header the
> other doesn't.
>
> Example message from source 1: "<86>Dec 3 18:25:10 my.hostname.com This
> is the message"
> Example message from source 2: "This is the other message".
>
> Assumption is, that both "This is the message" and "This is the other
> message" can be parsed using the same pattern.
>
> Would I/Should I need to use 3 Kafka topics  (1 for the syslog parser, 1
> for the chained parser and another identical for the direct ingestion) or 2
> Kafka topics (1 for the syslog parser, 1 for both, the enveloped/chained
> source and the "default" source).
>
> Appreciate your thoughts and comments.
>
> Best,
> Stefan
> --
> Stefan Kupstaitis-Dunkler
> https://datahovel.com/
> https://www.meetup.com/Hadoop-User-Group-Vienna/
> https://twitter.com/StefanDunkler
>


-- 
--
simon elliston ball
@sireb