You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Souvik Bose <so...@delgence.com> on 2014/12/08 12:34:52 UTC

Exception Handling with Flume

Hello All,
I am stuck with a problem with flume version 1.4.0. I am using 
spooldirectory source with a custom interceptor to process encoded gps 
files and save it in hdfs and solr (using morphline solr sink). The main 
informtion is stored on the file name itself which is coming in on the 
spool directory and the content is irrelevant. So I am using the custom 
interceptor to extract and transform the file header and store the 
extracted data in Json format as the output of the event.
My problem comes in:

1. When there is a 0 byte file comes in (generally files come in with a 
"!" symbol in the content) flume stops and throws an exception. We don't 
need the content of the file in any case, but still face exception as 
flume cannot handle 0 byte files.
2. When there is content with some weird characters like !f!, flume 
stops with exception
3. Even when everything is running fine, I am losing some data/ events. 
On closer introspection I found that some are available in hdfs but not 
in solr and vice versa. I am not using any processor sinkgroups like 
failover or load balancing. Is it because of that?

I want to achieve a solution where I can handle any exceptions and the 
file/data which causes the exception is discarded and flume processes 
the next file in the spool directory. The date comes in at high velocity 
100 files every seconds. So manually deleting the file and retstarting 
flume is the regular practice I do to keep everything back on track. But 
I am sure there must be some better ways to handle this case. Can you 
guys please suggests some better alternatives for my approach please//?/

Thanks & Regards,
Souvik Bose
///

Re: Exception Handling with Flume

Posted by Souvik Bose <so...@delgence.com>.
Hi Hari,
Thanks you for replying on my question. You are absolutely right, I am 
using only one channel for both the sinks which is causing the problem. 
Thanks for pointing that out, One problem is solved.
For spooldirectory, I am processing the files directly using my own 
custom interceptor. Here is the config for the source:

dnAgent.sources.gpslog.type = spooldir
dnAgent.sources.gpslog.spoolDir = /home/ktspool
dnAgent.sources.gpslog.batchSize = 500
dnAgent.sources.gpslog.channels = MemChannel
dnAgent.sources.gpslog.fileHeader = true
dnAgent.sources.gpslog.deletePolicy = immediate
dnAgent.sources.gpslog.useStrictSpooledFilePolicies = false
dnAgent.sources.gpslog.interceptors = KTFlowProcessInterceptor
dnAgent.sources.gpslog.interceptors.KTFlowProcessInterceptor.type=com.souvikbose.flume.interceptors.KTFlowProcessInterceptor$Builder

Generally this works great if everything is okay. But the problem is the 
gps provider doesn't have full control on what comes in so sometimes 
blank file with 0 bytes size comes in which causes flume to stop 
processing with exception and I have to manually restart the flume.

P.S: I am using flume 1.4.0 on cdh 4.4.0 on 4 data nodes in EC2.

Thanks & Regards,
Souvik
On 12/8/2014 11:36 PM, Hari Shreedharan wrote:
> You are likely reading from the same channel for both sinks. That 
> means only one sink gets your data. You’d need to have 2 channels 
> connected to the same source and each sink get its own channel.
>
> About the Spool Dir not processing data, what format/serializer etc 
> are you using?
>
> Thanks,
> Hari
>
>
> On Mon, Dec 8, 2014 at 3:37 AM, Souvik Bose <souvik.bose@delgence.com 
> <ma...@delgence.com>> wrote:
>
>     Hello All,
>     I am stuck with a problem with flume version 1.4.0. I am using
>     spooldirectory source with a custom interceptor to process encoded
>     gps files and save it in hdfs and solr (using morphline solr
>     sink). The main informtion is stored on the file name itself which
>     is coming in on the spool directory and the content is irrelevant.
>     So I am using the custom interceptor to extract and transform the
>     file header and store the extracted data in Json format as the
>     output of the event.
>     My problem comes in:
>
>     1. When there is a 0 byte file comes in (generally files come in
>     with a "!" symbol in the content) flume stops and throws an
>     exception. We don't need the content of the file in any case, but
>     still face exception as flume cannot handle 0 byte files.
>     2. When there is content with some weird characters like !ƒ!,
>     flume stops with exception
>     3. Even when everything is running fine, I am losing some data/
>     events. On closer introspection I found that some are available in
>     hdfs but not in solr and vice versa. I am not using any processor
>     sinkgroups like failover or load balancing. Is it because of that?
>
>     I want to achieve a solution where I can handle any exceptions and
>     the file/data which causes the exception is discarded and flume
>     processes the next file in the spool directory. The date comes in
>     at high velocity 100 files every seconds. So manually deleting the
>     file and retstarting flume is the regular practice I do to keep
>     everything back on track. But I am sure there must be some better
>     ways to handle this case. Can you guys please suggests some better
>     alternatives for my approach please//?/
>
>     Thanks & Regards,
>     Souvik Bose
>     ///
>
>

-- 
Met vriendelijke groeten / Mit freundlichen Grüßen / With kind regards,



Delgence | Delivering Intelligence
Delivering high quality IT solutions.

*Souvik Bose*
CIO

Development Office:
Rishi Tech Park Office No. E -3, Premises No. 02-360 Street No. 360 New 
Town Rajarhat
Kolkata-700156. India

Europe Office:
Liessentstraat 9a, 5405 AH  Uden
The Netherlands

*T*+91 9831607354 | T +31 616392268 | *
E* Souvik.bose@delgence.com <ma...@delgence.com> | *W* 
www.delgence.com <http://www.delgence.com>

/This communication and any attachments hereto may contain confidential 
information. Unauthorized use//
//or disclosure to additional parties is prohibited. If you are not an 
intended recipient, kindly notify the sender//
//and destroy all copies in your possession/


Re: Exception Handling with Flume

Posted by Hari Shreedharan <hs...@cloudera.com>.
You are likely reading from the same channel for both sinks. That means only one sink gets your data. You’d need to have 2 channels connected to the same source and each sink get its own channel. 




About the Spool Dir not processing data, what format/serializer etc are you using?


Thanks,
Hari

On Mon, Dec 8, 2014 at 3:37 AM, Souvik Bose <so...@delgence.com>
wrote:

> Hello All,
> I am stuck with a problem with flume version 1.4.0. I am using 
> spooldirectory source with a custom interceptor to process encoded gps 
> files and save it in hdfs and solr (using morphline solr sink). The main 
> informtion is stored on the file name itself which is coming in on the 
> spool directory and the content is irrelevant. So I am using the custom 
> interceptor to extract and transform the file header and store the 
> extracted data in Json format as the output of the event.
> My problem comes in:
> 1. When there is a 0 byte file comes in (generally files come in with a 
> "!" symbol in the content) flume stops and throws an exception. We don't 
> need the content of the file in any case, but still face exception as 
> flume cannot handle 0 byte files.
> 2. When there is content with some weird characters like !f!, flume 
> stops with exception
> 3. Even when everything is running fine, I am losing some data/ events. 
> On closer introspection I found that some are available in hdfs but not 
> in solr and vice versa. I am not using any processor sinkgroups like 
> failover or load balancing. Is it because of that?
> I want to achieve a solution where I can handle any exceptions and the 
> file/data which causes the exception is discarded and flume processes 
> the next file in the spool directory. The date comes in at high velocity 
> 100 files every seconds. So manually deleting the file and retstarting 
> flume is the regular practice I do to keep everything back on track. But 
> I am sure there must be some better ways to handle this case. Can you 
> guys please suggests some better alternatives for my approach please//?/
> Thanks & Regards,
> Souvik Bose
> ///