You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Otis Gospodnetic <ot...@gmail.com> on 2013/11/12 06:09:19 UTC

MorphlineInterceptor questions

Hi,

While poking around MorphlineSolrSink I got intrigued by
MorphlineIntercepor in ...solr.morphline package.  A few Qs:

1) This is also not Solr-specific, right?

2) I couldn't find any code in ...solr.morphline package that actually
uses this MorphlineInterceptor... is it not used?

3) I see Morphline command's "process(...)" method being called from
both MorphlineIntercetor AND from MorphlineHandlerImpl.  How come?  My
impression is that MorphlineHandlerImpl code is what is actually meant
to be used, while MorphlineInterceptor doesn't seem to be used....
what am I missing? :)

4) I found the following in the Flume Guide: "This interceptor is not
intended for heavy duty ETL processing - if you need this consider
moving ETL processing from the Flume Source to a Flume Sink".
Why should one not use MorphlineInterceptor for heavy duty ETL processing?

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

Re: MorphlineInterceptor questions

Posted by Wolfgang Hoschek <wh...@cloudera.com>.
On Nov 11, 2013, at 9:09 PM, Otis Gospodnetic wrote:

> Hi,
> 
> While poking around MorphlineSolrSink I got intrigued by
> MorphlineIntercepor in ...solr.morphline package.  A few Qs:
> 
> 1) This is also not Solr-specific, right?

yep

> 
> 2) I couldn't find any code in ...solr.morphline package that actually
> uses this MorphlineInterceptor... is it not used?

In Flume an Interceptor is a separate concept from a Sink. You can use the Interceptor without the Sink, and vice versa.

> 
> 3) I see Morphline command's "process(...)" method being called from
> both MorphlineIntercetor AND from MorphlineHandlerImpl.  How come?  My
> impression is that MorphlineHandlerImpl code is what is actually meant
> to be used, while MorphlineInterceptor doesn't seem to be used....
> what am I missing? :)
> 
> 4) I found the following in the Flume Guide: "This interceptor is not
> intended for heavy duty ETL processing - if you need this consider
> moving ETL processing from the Flume Source to a Flume Sink".
> Why should one not use MorphlineInterceptor for heavy duty ETL processing?

Two reasons: 

1) Interceptors are running in the thread of the Flume Source, and are thus tightly coupled to the Flume Source and the I/O handler of the Flume Source. It's safer to not block or fail in that thread - better to hand data off of that thread as soon as possible into the Flume Channel (i.e a queue from which sinks take events - sinks run in another thread and are thus more isolated). 

2) Flume Interceptors have the limitation that they can only generate zero or one output events for each input event. So generating N events for an input event isn't possible, like one might want to do when emitting one event per input line, or or one event per input column, or one event per email attachment, etc. 

To summarize, the reasons aren't specific to morphlines, they are rooted in the way Flume has designed the concept of Interceptors. 

Wolfgang.

> 
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/