You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Matt Wise <ma...@nextdoor.com> on 2013/11/10 02:28:09 UTC

Dynamic Key=Value Parsing with an Interceptor?

Hey we'd like to set up a default format for all of our logging systems...
perhaps looking like this:

  "key1=value1;key2=value2;key3=value3...."

With this pattern, we'd allow developers to define any key/value pairs they
want to log, and separate them with a common separator.

If we did this, what do we need to do in Flume to get Flume to parse out
the key=value pairs into dynamic headers? We pass our data from Flume into
both HDFS and ElasticSearch sinks. We would really like to have these
fields dynamically sent to the sinks for much easier parsing and analysis
later.

Any thoughts on this? I know that we can define a unique interceptor for
each service that looks for explicit field names ... but thats a nightmare
to manage. I really want something truly dynamic.

Matt Wise
Sr. Systems Architect
Nextdoor.com

Re: Dynamic Key=Value Parsing with an Interceptor?

Posted by Wolfgang Hoschek <wh...@cloudera.com>.
Consider if the splitKeyValue command is applicable here, perhaps in combination with readLine, split and grok. 

Example is here: http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html#/splitKeyValue

Wolfgang.

On Nov 12, 2013, at 3:18 PM, Matt Wise wrote:

> Paul,
>   Thanks for the feedback. I looked briefly at Morphline, but wasn't sure if it was what I needed. I will take a deeper dive this week and see if it will do what we want. Ultimately the reason we're not changing the apps is that we honestly don't always have a lot of control. Many of the apps are 3rd party apps where we just barely have the ability to adjust their log-line-formats.
> 
> Matt Wise
> Sr. Systems Architect
> Nextdoor.com
> 
> 
> On Mon, Nov 11, 2013 at 3:09 PM, Paul Chavez <pc...@verticalsearchworks.com> wrote:
> I think there may be two ‘out of box’ ways to do this kind of thing. First would be using the regex extract interceptor with multiple serializers keying on various fields. However that’s not really dynamic and just kind of a half-step better from one interceptor for each field as you mentioned. Second would be to use the morphline interceptor to parse your event body and insert headers as needed. I have to admit I have no experience with this interceptor but in reading the documentation it seems designed for this kind of use case.
> 
>  
> 
> Ultimately though, when faced with this we opted to push this into the app layer. Is there a reason the applications can’t write these key/value pairs as headers in the first place? We use an HTTP source and when we wrote the logging class for it on our app side we put similar functionality in as category/subcategory headers. Then flume doesn’t have to have any special interceptors beyond a default static one in case the headers are completely missing, and we write to HDFS with tokenized paths so each permutation of those headers gets a separate directory.
> 
>  
> 
> If you continue to explore this issue, please keep us updated. I especially would like to hear some real world morphline examples.
> 
>  
> 
> Hope that helps,
> 
> Paul Chavez
> 
>  
> 
>  
> 
> From: Matt Wise [mailto:matt@nextdoor.com] 
> Sent: Monday, November 11, 2013 10:04 AM
> To: user@flume.apache.org
> Subject: Re: Dynamic Key=Value Parsing with an Interceptor?
> 
>  
> 
> Anyone have any ideas on the best way to do this?
> 
> 
> 
> Matt Wise
> 
> Sr. Systems Architect
> 
> Nextdoor.com
> 
>  
> 
> On Sat, Nov 9, 2013 at 5:28 PM, Matt Wise <ma...@nextdoor.com> wrote:
> 
> Hey we'd like to set up a default format for all of our logging systems... perhaps looking like this:
> 
>  
> 
>   "key1=value1;key2=value2;key3=value3...."
> 
>  
> 
> With this pattern, we'd allow developers to define any key/value pairs they want to log, and separate them with a common separator.
> 
>  
> 
> If we did this, what do we need to do in Flume to get Flume to parse out the key=value pairs into dynamic headers? We pass our data from Flume into both HDFS and ElasticSearch sinks. We would really like to have these fields dynamically sent to the sinks for much easier parsing and analysis later.
> 
>  
> 
> Any thoughts on this? I know that we can define a unique interceptor for each service that looks for explicit field names ... but thats a nightmare to manage. I really want something truly dynamic.
> 
> 
> 
> Matt Wise
> 
> Sr. Systems Architect
> 
> Nextdoor.com
> 
>  
> 
> 


Re: Dynamic Key=Value Parsing with an Interceptor?

Posted by Matt Wise <ma...@nextdoor.com>.
Paul,
  Thanks for the feedback. I looked briefly at Morphline, but wasn't sure
if it was what I needed. I will take a deeper dive this week and see if it
will do what we want. Ultimately the reason we're not changing the apps is
that we honestly don't always have a lot of control. Many of the apps are
3rd party apps where we just barely have the ability to adjust their
log-line-formats.

Matt Wise
Sr. Systems Architect
Nextdoor.com


On Mon, Nov 11, 2013 at 3:09 PM, Paul Chavez <
pchavez@verticalsearchworks.com> wrote:

> I think there may be two ‘out of box’ ways to do this kind of thing. First
> would be using the regex extract interceptor with multiple serializers
> keying on various fields. However that’s not really dynamic and just kind
> of a half-step better from one interceptor for each field as you mentioned.
> Second would be to use the morphline interceptor to parse your event body
> and insert headers as needed. I have to admit I have no experience with
> this interceptor but in reading the documentation it seems designed for
> this kind of use case.
>
>
>
> Ultimately though, when faced with this we opted to push this into the app
> layer. Is there a reason the applications can’t write these key/value pairs
> as headers in the first place? We use an HTTP source and when we wrote the
> logging class for it on our app side we put similar functionality in as
> category/subcategory headers. Then flume doesn’t have to have any special
> interceptors beyond a default static one in case the headers are completely
> missing, and we write to HDFS with tokenized paths so each permutation of
> those headers gets a separate directory.
>
>
>
> If you continue to explore this issue, please keep us updated. I
> especially would like to hear some real world morphline examples.
>
>
>
> Hope that helps,
>
> Paul Chavez
>
>
>
>
>
> *From:* Matt Wise [mailto:matt@nextdoor.com]
> *Sent:* Monday, November 11, 2013 10:04 AM
> *To:* user@flume.apache.org
> *Subject:* Re: Dynamic Key=Value Parsing with an Interceptor?
>
>
>
> Anyone have any ideas on the best way to do this?
>
>
> Matt Wise
>
> Sr. Systems Architect
>
> Nextdoor.com
>
>
>
> On Sat, Nov 9, 2013 at 5:28 PM, Matt Wise <ma...@nextdoor.com> wrote:
>
> Hey we'd like to set up a default format for all of our logging systems...
> perhaps looking like this:
>
>
>
>   "key1=value1;key2=value2;key3=value3...."
>
>
>
> With this pattern, we'd allow developers to define any key/value pairs
> they want to log, and separate them with a common separator.
>
>
>
> If we did this, what do we need to do in Flume to get Flume to parse out
> the key=value pairs into dynamic headers? We pass our data from Flume into
> both HDFS and ElasticSearch sinks. We would really like to have these
> fields dynamically sent to the sinks for much easier parsing and analysis
> later.
>
>
>
> Any thoughts on this? I know that we can define a unique interceptor for
> each service that looks for explicit field names ... but thats a nightmare
> to manage. I really want something truly dynamic.
>
>
> Matt Wise
>
> Sr. Systems Architect
>
> Nextdoor.com
>
>
>

RE: Dynamic Key=Value Parsing with an Interceptor?

Posted by Paul Chavez <pc...@verticalsearchworks.com>.
I think there may be two 'out of box' ways to do this kind of thing. First would be using the regex extract interceptor with multiple serializers keying on various fields. However that's not really dynamic and just kind of a half-step better from one interceptor for each field as you mentioned. Second would be to use the morphline interceptor to parse your event body and insert headers as needed. I have to admit I have no experience with this interceptor but in reading the documentation it seems designed for this kind of use case.

Ultimately though, when faced with this we opted to push this into the app layer. Is there a reason the applications can't write these key/value pairs as headers in the first place? We use an HTTP source and when we wrote the logging class for it on our app side we put similar functionality in as category/subcategory headers. Then flume doesn't have to have any special interceptors beyond a default static one in case the headers are completely missing, and we write to HDFS with tokenized paths so each permutation of those headers gets a separate directory.

If you continue to explore this issue, please keep us updated. I especially would like to hear some real world morphline examples.

Hope that helps,
Paul Chavez


From: Matt Wise [mailto:matt@nextdoor.com]
Sent: Monday, November 11, 2013 10:04 AM
To: user@flume.apache.org
Subject: Re: Dynamic Key=Value Parsing with an Interceptor?

Anyone have any ideas on the best way to do this?

Matt Wise
Sr. Systems Architect
Nextdoor.com

On Sat, Nov 9, 2013 at 5:28 PM, Matt Wise <ma...@nextdoor.com>> wrote:
Hey we'd like to set up a default format for all of our logging systems... perhaps looking like this:

  "key1=value1;key2=value2;key3=value3...."

With this pattern, we'd allow developers to define any key/value pairs they want to log, and separate them with a common separator.

If we did this, what do we need to do in Flume to get Flume to parse out the key=value pairs into dynamic headers? We pass our data from Flume into both HDFS and ElasticSearch sinks. We would really like to have these fields dynamically sent to the sinks for much easier parsing and analysis later.

Any thoughts on this? I know that we can define a unique interceptor for each service that looks for explicit field names ... but thats a nightmare to manage. I really want something truly dynamic.

Matt Wise
Sr. Systems Architect
Nextdoor.com


Re: Dynamic Key=Value Parsing with an Interceptor?

Posted by Matt Wise <ma...@nextdoor.com>.
Anyone have any ideas on the best way to do this?

Matt Wise
Sr. Systems Architect
Nextdoor.com


On Sat, Nov 9, 2013 at 5:28 PM, Matt Wise <ma...@nextdoor.com> wrote:

> Hey we'd like to set up a default format for all of our logging systems...
> perhaps looking like this:
>
>   "key1=value1;key2=value2;key3=value3...."
>
> With this pattern, we'd allow developers to define any key/value pairs
> they want to log, and separate them with a common separator.
>
> If we did this, what do we need to do in Flume to get Flume to parse out
> the key=value pairs into dynamic headers? We pass our data from Flume into
> both HDFS and ElasticSearch sinks. We would really like to have these
> fields dynamically sent to the sinks for much easier parsing and analysis
> later.
>
> Any thoughts on this? I know that we can define a unique interceptor for
> each service that looks for explicit field names ... but thats a nightmare
> to manage. I really want something truly dynamic.
>
> Matt Wise
> Sr. Systems Architect
> Nextdoor.com
>