You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Matt Wise <ma...@nextdoor.com> on 2013/05/10 19:29:21 UTC

How to get a bad message out of the channel?

We were messing around with a few settings today and ended up getting a few messages into our channel that are bad (corrupt time field). How can I clear them out?

> 10 May 2013 17:28:26,920 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver event. Exception follows.
> org.apache.flume.EventDeliveryException: java.lang.RuntimeException: Flume wasn't able to parse timestamp header in the event to resolve time based bucketing. Please check that you're correctly populating timestamp header (for example using TimestampInterceptor source interceptor).
> 	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:461)
> 	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> 	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> 	at java.lang.Thread.run(Thread.java:679)
> Caused by: java.lang.RuntimeException: Flume wasn't able to parse timestamp header in the event to resolve time based bucketing. Please check that you're correctly populating timestamp header (for example using TimestampInterceptor source interceptor).
> 	at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:160)
> 	at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:343)
> 	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
> 	... 3 more
> Caused by: java.lang.NumberFormatException: null
> 	at java.lang.Long.parseLong(Long.java:401)
> 	at java.lang.Long.valueOf(Long.java:535)
> 	at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:158)
> 	... 5 more

This message just keeps repeating over and over again.. new events are coming through just fine.

Re: How to get a bad message out of the channel?

Posted by Mike Percy <mp...@apache.org>.
Depends on your definition of "bad". I can give more useful advice if you
can cite specific problem scenarios.

Mike



On Thu, May 23, 2013 at 9:49 AM, Matt Wise <ma...@nextdoor.com> wrote:

> Mike,
>   We do that .. but somehow we had ended up with an event or two in the
> pipeline that was bad. It would be really nice if there was some way to
> choose what to do when a bad event was found -- rather than letting the
> pipeline fill up quickly. Ie..
>    a) Dump the event to a data file and throw a warning in the log
> messages?
>    b) Throw the event away
>    c) Move the event to an alternate channel where it can be handled
> differently
>
>   Anything other than "stop pulling data from the channel and let the
> channel fill"
>
> --Matt
>
> On May 22, 2013, at 12:39 AM, Mike Percy <mp...@apache.org> wrote:
>
> Hi Matt,
> Nope, there is currently no way to do that. But you could use the
> timestamp interceptor to make sure your events always have those headers.
>
> Mike
>
>
> On Mon, May 13, 2013 at 12:13 PM, Matt Wise <ma...@nextdoor.com> wrote:
>
>> Great, thats working.. thank you. Is there a way to give the HDFS plugin
>> a 'failsafe' path to write messages to when they are missing that kind of
>> data?
>>
>> --Matt
>>
>> On May 10, 2013, at 6:30 PM, Mike Percy <mp...@apache.org> wrote:
>>
>> > Hook up a HDFS sink to them that doesn't use %Y, %m, etc in the
>> configured path.
>> >
>> > HTH,
>> > Mike
>> >
>> > On May 10, 2013, at 11:00 AM, Matt Wise <ma...@nextdoor.com> wrote:
>> >
>> >> Eek, this was worse than I thought. Turns out message continued to be
>> added to the channels, but no transactions could complete to take messages
>> out of the channel. I've moved the file channels out of the way and
>> restarted the service for now ... but how can I recover the rest of the
>> data in these filechannels?
>> >>
>> >> On May 10, 2013, at 10:29 AM, Matt Wise <ma...@nextdoor.com> wrote:
>> >>
>> >>> We were messing around with a few settings today and ended up getting
>> a few messages into our channel that are bad (corrupt time field). How can
>> I clear them out?
>> >>>
>> >>>> 10 May 2013 17:28:26,920 ERROR
>> [SinkRunner-PollingRunner-DefaultSinkProcessor]
>> (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver
>> event. Exception follows.
>> >>>> org.apache.flume.EventDeliveryException: java.lang.RuntimeException:
>> Flume wasn't able to parse timestamp header in the event to resolve time
>> based bucketing. Please check that you're correctly populating timestamp
>> header (for example using TimestampInterceptor source interceptor).
>> >>>>   at
>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:461)
>> >>>>   at
>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>> >>>>   at
>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>> >>>>   at java.lang.Thread.run(Thread.java:679)
>> >>>> Caused by: java.lang.RuntimeException: Flume wasn't able to parse
>> timestamp header in the event to resolve time based bucketing. Please check
>> that you're correctly populating timestamp header (for example using
>> TimestampInterceptor source interceptor).
>> >>>>   at
>> org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:160)
>> >>>>   at
>> org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:343)
>> >>>>   at
>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
>> >>>>   ... 3 more
>> >>>> Caused by: java.lang.NumberFormatException: null
>> >>>>   at java.lang.Long.parseLong(Long.java:401)
>> >>>>   at java.lang.Long.valueOf(Long.java:535)
>> >>>>   at
>> org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:158)
>> >>>>   ... 5 more
>> >>>
>> >>> This message just keeps repeating over and over again.. new events
>> are coming through just fine.
>> >>
>>
>>
>
>

Re: How to get a bad message out of the channel?

Posted by Matt Wise <ma...@nextdoor.com>.
Mike,
  We do that .. but somehow we had ended up with an event or two in the pipeline that was bad. It would be really nice if there was some way to choose what to do when a bad event was found -- rather than letting the pipeline fill up quickly. Ie..
   a) Dump the event to a data file and throw a warning in the log messages?
   b) Throw the event away
   c) Move the event to an alternate channel where it can be handled differently 

  Anything other than "stop pulling data from the channel and let the channel fill"

--Matt

On May 22, 2013, at 12:39 AM, Mike Percy <mp...@apache.org> wrote:

> Hi Matt,
> Nope, there is currently no way to do that. But you could use the timestamp interceptor to make sure your events always have those headers.
> 
> Mike
> 
> 
> On Mon, May 13, 2013 at 12:13 PM, Matt Wise <ma...@nextdoor.com> wrote:
> Great, thats working.. thank you. Is there a way to give the HDFS plugin a 'failsafe' path to write messages to when they are missing that kind of data?
> 
> --Matt
> 
> On May 10, 2013, at 6:30 PM, Mike Percy <mp...@apache.org> wrote:
> 
> > Hook up a HDFS sink to them that doesn't use %Y, %m, etc in the configured path.
> >
> > HTH,
> > Mike
> >
> > On May 10, 2013, at 11:00 AM, Matt Wise <ma...@nextdoor.com> wrote:
> >
> >> Eek, this was worse than I thought. Turns out message continued to be added to the channels, but no transactions could complete to take messages out of the channel. I've moved the file channels out of the way and restarted the service for now ... but how can I recover the rest of the data in these filechannels?
> >>
> >> On May 10, 2013, at 10:29 AM, Matt Wise <ma...@nextdoor.com> wrote:
> >>
> >>> We were messing around with a few settings today and ended up getting a few messages into our channel that are bad (corrupt time field). How can I clear them out?
> >>>
> >>>> 10 May 2013 17:28:26,920 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver event. Exception follows.
> >>>> org.apache.flume.EventDeliveryException: java.lang.RuntimeException: Flume wasn't able to parse timestamp header in the event to resolve time based bucketing. Please check that you're correctly populating timestamp header (for example using TimestampInterceptor source interceptor).
> >>>>   at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:461)
> >>>>   at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> >>>>   at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> >>>>   at java.lang.Thread.run(Thread.java:679)
> >>>> Caused by: java.lang.RuntimeException: Flume wasn't able to parse timestamp header in the event to resolve time based bucketing. Please check that you're correctly populating timestamp header (for example using TimestampInterceptor source interceptor).
> >>>>   at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:160)
> >>>>   at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:343)
> >>>>   at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
> >>>>   ... 3 more
> >>>> Caused by: java.lang.NumberFormatException: null
> >>>>   at java.lang.Long.parseLong(Long.java:401)
> >>>>   at java.lang.Long.valueOf(Long.java:535)
> >>>>   at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:158)
> >>>>   ... 5 more
> >>>
> >>> This message just keeps repeating over and over again.. new events are coming through just fine.
> >>
> 
> 


Re: How to get a bad message out of the channel?

Posted by Mike Percy <mp...@apache.org>.
Hi Matt,
Nope, there is currently no way to do that. But you could use the timestamp
interceptor to make sure your events always have those headers.

Mike


On Mon, May 13, 2013 at 12:13 PM, Matt Wise <ma...@nextdoor.com> wrote:

> Great, thats working.. thank you. Is there a way to give the HDFS plugin a
> 'failsafe' path to write messages to when they are missing that kind of
> data?
>
> --Matt
>
> On May 10, 2013, at 6:30 PM, Mike Percy <mp...@apache.org> wrote:
>
> > Hook up a HDFS sink to them that doesn't use %Y, %m, etc in the
> configured path.
> >
> > HTH,
> > Mike
> >
> > On May 10, 2013, at 11:00 AM, Matt Wise <ma...@nextdoor.com> wrote:
> >
> >> Eek, this was worse than I thought. Turns out message continued to be
> added to the channels, but no transactions could complete to take messages
> out of the channel. I've moved the file channels out of the way and
> restarted the service for now ... but how can I recover the rest of the
> data in these filechannels?
> >>
> >> On May 10, 2013, at 10:29 AM, Matt Wise <ma...@nextdoor.com> wrote:
> >>
> >>> We were messing around with a few settings today and ended up getting
> a few messages into our channel that are bad (corrupt time field). How can
> I clear them out?
> >>>
> >>>> 10 May 2013 17:28:26,920 ERROR
> [SinkRunner-PollingRunner-DefaultSinkProcessor]
> (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver
> event. Exception follows.
> >>>> org.apache.flume.EventDeliveryException: java.lang.RuntimeException:
> Flume wasn't able to parse timestamp header in the event to resolve time
> based bucketing. Please check that you're correctly populating timestamp
> header (for example using TimestampInterceptor source interceptor).
> >>>>   at
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:461)
> >>>>   at
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> >>>>   at
> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> >>>>   at java.lang.Thread.run(Thread.java:679)
> >>>> Caused by: java.lang.RuntimeException: Flume wasn't able to parse
> timestamp header in the event to resolve time based bucketing. Please check
> that you're correctly populating timestamp header (for example using
> TimestampInterceptor source interceptor).
> >>>>   at
> org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:160)
> >>>>   at
> org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:343)
> >>>>   at
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
> >>>>   ... 3 more
> >>>> Caused by: java.lang.NumberFormatException: null
> >>>>   at java.lang.Long.parseLong(Long.java:401)
> >>>>   at java.lang.Long.valueOf(Long.java:535)
> >>>>   at
> org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:158)
> >>>>   ... 5 more
> >>>
> >>> This message just keeps repeating over and over again.. new events are
> coming through just fine.
> >>
>
>

Re: How to get a bad message out of the channel?

Posted by Matt Wise <ma...@nextdoor.com>.
Great, thats working.. thank you. Is there a way to give the HDFS plugin a 'failsafe' path to write messages to when they are missing that kind of data?

--Matt

On May 10, 2013, at 6:30 PM, Mike Percy <mp...@apache.org> wrote:

> Hook up a HDFS sink to them that doesn't use %Y, %m, etc in the configured path.
> 
> HTH,
> Mike
> 
> On May 10, 2013, at 11:00 AM, Matt Wise <ma...@nextdoor.com> wrote:
> 
>> Eek, this was worse than I thought. Turns out message continued to be added to the channels, but no transactions could complete to take messages out of the channel. I've moved the file channels out of the way and restarted the service for now ... but how can I recover the rest of the data in these filechannels? 
>> 
>> On May 10, 2013, at 10:29 AM, Matt Wise <ma...@nextdoor.com> wrote:
>> 
>>> We were messing around with a few settings today and ended up getting a few messages into our channel that are bad (corrupt time field). How can I clear them out?
>>> 
>>>> 10 May 2013 17:28:26,920 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver event. Exception follows.
>>>> org.apache.flume.EventDeliveryException: java.lang.RuntimeException: Flume wasn't able to parse timestamp header in the event to resolve time based bucketing. Please check that you're correctly populating timestamp header (for example using TimestampInterceptor source interceptor).
>>>>   at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:461)
>>>>   at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>>>   at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>>   at java.lang.Thread.run(Thread.java:679)
>>>> Caused by: java.lang.RuntimeException: Flume wasn't able to parse timestamp header in the event to resolve time based bucketing. Please check that you're correctly populating timestamp header (for example using TimestampInterceptor source interceptor).
>>>>   at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:160)
>>>>   at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:343)
>>>>   at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
>>>>   ... 3 more
>>>> Caused by: java.lang.NumberFormatException: null
>>>>   at java.lang.Long.parseLong(Long.java:401)
>>>>   at java.lang.Long.valueOf(Long.java:535)
>>>>   at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:158)
>>>>   ... 5 more
>>> 
>>> This message just keeps repeating over and over again.. new events are coming through just fine.
>> 


Re: How to get a bad message out of the channel?

Posted by Mike Percy <mp...@apache.org>.
Hook up a HDFS sink to them that doesn't use %Y, %m, etc in the configured path.

HTH,
Mike

On May 10, 2013, at 11:00 AM, Matt Wise <ma...@nextdoor.com> wrote:

> Eek, this was worse than I thought. Turns out message continued to be added to the channels, but no transactions could complete to take messages out of the channel. I've moved the file channels out of the way and restarted the service for now ... but how can I recover the rest of the data in these filechannels? 
> 
> On May 10, 2013, at 10:29 AM, Matt Wise <ma...@nextdoor.com> wrote:
> 
>> We were messing around with a few settings today and ended up getting a few messages into our channel that are bad (corrupt time field). How can I clear them out?
>> 
>>> 10 May 2013 17:28:26,920 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver event. Exception follows.
>>> org.apache.flume.EventDeliveryException: java.lang.RuntimeException: Flume wasn't able to parse timestamp header in the event to resolve time based bucketing. Please check that you're correctly populating timestamp header (for example using TimestampInterceptor source interceptor).
>>>    at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:461)
>>>    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>>    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>    at java.lang.Thread.run(Thread.java:679)
>>> Caused by: java.lang.RuntimeException: Flume wasn't able to parse timestamp header in the event to resolve time based bucketing. Please check that you're correctly populating timestamp header (for example using TimestampInterceptor source interceptor).
>>>    at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:160)
>>>    at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:343)
>>>    at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
>>>    ... 3 more
>>> Caused by: java.lang.NumberFormatException: null
>>>    at java.lang.Long.parseLong(Long.java:401)
>>>    at java.lang.Long.valueOf(Long.java:535)
>>>    at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:158)
>>>    ... 5 more
>> 
>> This message just keeps repeating over and over again.. new events are coming through just fine.
> 

Re: How to get a bad message out of the channel?

Posted by Matt Wise <ma...@nextdoor.com>.
Eek, this was worse than I thought. Turns out message continued to be added to the channels, but no transactions could complete to take messages out of the channel. I've moved the file channels out of the way and restarted the service for now ... but how can I recover the rest of the data in these filechannels? 

On May 10, 2013, at 10:29 AM, Matt Wise <ma...@nextdoor.com> wrote:

> We were messing around with a few settings today and ended up getting a few messages into our channel that are bad (corrupt time field). How can I clear them out?
> 
>> 10 May 2013 17:28:26,920 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner$PollingRunner.run:160)  - Unable to deliver event. Exception follows.
>> org.apache.flume.EventDeliveryException: java.lang.RuntimeException: Flume wasn't able to parse timestamp header in the event to resolve time based bucketing. Please check that you're correctly populating timestamp header (for example using TimestampInterceptor source interceptor).
>> 	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:461)
>> 	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>> 	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>> 	at java.lang.Thread.run(Thread.java:679)
>> Caused by: java.lang.RuntimeException: Flume wasn't able to parse timestamp header in the event to resolve time based bucketing. Please check that you're correctly populating timestamp header (for example using TimestampInterceptor source interceptor).
>> 	at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:160)
>> 	at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:343)
>> 	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
>> 	... 3 more
>> Caused by: java.lang.NumberFormatException: null
>> 	at java.lang.Long.parseLong(Long.java:401)
>> 	at java.lang.Long.valueOf(Long.java:535)
>> 	at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:158)
>> 	... 5 more
> 
> This message just keeps repeating over and over again.. new events are coming through just fine.