You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Juhani Connolly <ju...@gmail.com> on 2012/03/22 09:23:00 UTC

Review Request: FLUME-1030, handle sinks with exceptions by putting them on a cooldown

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4445/
-----------------------------------------------------------

Review request for Flume.


Summary
-------

As discussed in the JIRA item, I modified FailoverSink to deal with all exceptions.
Now a sink that fails will be put onto a failed links queue, from which a recovery will be attempted after a timeout. Each sequential failure the timeout will increase. I am open to other methods of increasing the timeout(maybe add on a ceiling?)


This addresses bug FLUME-1030.
    https://issues.apache.org/jira/browse/FLUME-1030


Diffs
-----

  flume-ng-core/src/test/java/org/apache/flume/sink/TestFailoverSinkProcessor.java 195c121 
  flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java 7eada57 

Diff: https://reviews.apache.org/r/4445/diff


Testing
-------

Modified the test for the new functionality, new test passes

No other tests should be affected, but my environment was having some weird problems. I'll look into them tomorrow, just leaving this up so people can have a browse and will confirm tests passing tomorrow


Thanks,

Juhani


Re: Review Request: FLUME-1030, handle sinks with exceptions by putting them on a cooldown

Posted by Juhani Connolly <ju...@gmail.com>.

> On 2012-03-22 17:07:01, Arvind Prabhakar wrote:
> > Thanks for the patch Juhani. I was able to run the tests successfully. I have some minor feedback below for your consideration.

thanks for running the tests. back to normal on my end too


> On 2012-03-22 17:07:01, Arvind Prabhakar wrote:
> > flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java, line 90
> > <https://reviews.apache.org/r/4445/diff/1/?file=94575#file94575line90>
> >
> >     It will be good to cap this penalty amount to a predefined/configured ceiling value.

Added a config variable


> On 2012-03-22 17:07:01, Arvind Prabhakar wrote:
> > flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java, lines 120-121
> > <https://reviews.apache.org/r/4445/diff/1/?file=94575#file94575line120>
> >
> >     There is one slight issue here though - which is if the channel is empty, the sink being attempted to recover will likely return BACKOFF, implying that the sink is normal and has recovered. 
> >     
> >     A minor nit: it will be nice if the process invocation on the failed sink was from within the process() that calls the active Sink. That way the logic stays in one place.

I got rid of the queue subclass and put the code in process... Though I'm not sure if that is the easiest way for the human brain to parse it...

I also changed things so that a backoff results in being returned to the failed list without a penalty


- Juhani


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4445/#review6231
-----------------------------------------------------------


On 2012-03-22 08:23:00, Juhani Connolly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4445/
> -----------------------------------------------------------
> 
> (Updated 2012-03-22 08:23:00)
> 
> 
> Review request for Flume.
> 
> 
> Summary
> -------
> 
> As discussed in the JIRA item, I modified FailoverSink to deal with all exceptions.
> Now a sink that fails will be put onto a failed links queue, from which a recovery will be attempted after a timeout. Each sequential failure the timeout will increase. I am open to other methods of increasing the timeout(maybe add on a ceiling?)
> 
> 
> This addresses bug FLUME-1030.
>     https://issues.apache.org/jira/browse/FLUME-1030
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/test/java/org/apache/flume/sink/TestFailoverSinkProcessor.java 195c121 
>   flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java 7eada57 
> 
> Diff: https://reviews.apache.org/r/4445/diff
> 
> 
> Testing
> -------
> 
> Modified the test for the new functionality, new test passes
> 
> No other tests should be affected, but my environment was having some weird problems. I'll look into them tomorrow, just leaving this up so people can have a browse and will confirm tests passing tomorrow
> 
> 
> Thanks,
> 
> Juhani
> 
>


Re: Review Request: FLUME-1030, handle sinks with exceptions by putting them on a cooldown

Posted by Arvind Prabhakar <ar...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4445/#review6231
-----------------------------------------------------------


Thanks for the patch Juhani. I was able to run the tests successfully. I have some minor feedback below for your consideration.


flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java
<https://reviews.apache.org/r/4445/#comment13432>

    It will be good to cap this penalty amount to a predefined/configured ceiling value.



flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java
<https://reviews.apache.org/r/4445/#comment13438>

    There is one slight issue here though - which is if the channel is empty, the sink being attempted to recover will likely return BACKOFF, implying that the sink is normal and has recovered. 
    
    A minor nit: it will be nice if the process invocation on the failed sink was from within the process() that calls the active Sink. That way the logic stays in one place.


- Arvind


On 2012-03-22 08:23:00, Juhani Connolly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4445/
> -----------------------------------------------------------
> 
> (Updated 2012-03-22 08:23:00)
> 
> 
> Review request for Flume.
> 
> 
> Summary
> -------
> 
> As discussed in the JIRA item, I modified FailoverSink to deal with all exceptions.
> Now a sink that fails will be put onto a failed links queue, from which a recovery will be attempted after a timeout. Each sequential failure the timeout will increase. I am open to other methods of increasing the timeout(maybe add on a ceiling?)
> 
> 
> This addresses bug FLUME-1030.
>     https://issues.apache.org/jira/browse/FLUME-1030
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/test/java/org/apache/flume/sink/TestFailoverSinkProcessor.java 195c121 
>   flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java 7eada57 
> 
> Diff: https://reviews.apache.org/r/4445/diff
> 
> 
> Testing
> -------
> 
> Modified the test for the new functionality, new test passes
> 
> No other tests should be affected, but my environment was having some weird problems. I'll look into them tomorrow, just leaving this up so people can have a browse and will confirm tests passing tomorrow
> 
> 
> Thanks,
> 
> Juhani
> 
>


Re: Review Request: FLUME-1030, handle sinks with exceptions by putting them on a cooldown

Posted by Juhani Connolly <ju...@gmail.com>.

> On 2012-03-23 03:48:14, Arvind Prabhakar wrote:
> > flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java, line 193
> > <https://reviews.apache.org/r/4445/diff/2/?file=94789#file94789line193>
> >
> >     Log the exception.

Did this, changed EventDeliveryException->Exception


> On 2012-03-23 03:48:14, Arvind Prabhakar wrote:
> > flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java, line 182
> > <https://reviews.apache.org/r/4445/diff/2/?file=94789#file94789line182>
> >
> >     It will be good to log this exception so we have a trace of the failures that are happening.

Was just using it to guard against null and number exceptions at the same time. Separated it out and checked for null. Logging the Number exception because it's probably a typo in config


> On 2012-03-23 03:48:14, Arvind Prabhakar wrote:
> > flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java, line 138
> > <https://reviews.apache.org/r/4445/diff/2/?file=94789#file94789line138>
> >
> >     This seems like a typo. Perhaps you want to do something like
> >     
> >     maxPenalty = context.getInteger(CONF_KEY_MAX_PENALTY, DEFAULT_MAX_PENALTY);
> >     
> >

Yes. Probably not enough sleep. I checked in the debugger that things were getting correctly set/read now


> On 2012-03-23 03:48:14, Arvind Prabhakar wrote:
> > flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java, lines 94-95
> > <https://reviews.apache.org/r/4445/diff/2/?file=94789#file94789line94>
> >
> >     The max penalty calculated during configure of the sink processor should be applied here to enforce the ceiling.

Done


On 2012-03-23 03:48:14, Juhani Connolly wrote:
> > Rest of the changes look good to me.

Wow, what a mess. Must've been tired or something.

The name of the max limit also had a period on its end that I removed.


- Juhani


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4445/#review6265
-----------------------------------------------------------


On 2012-03-23 02:41:03, Juhani Connolly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4445/
> -----------------------------------------------------------
> 
> (Updated 2012-03-23 02:41:03)
> 
> 
> Review request for Flume.
> 
> 
> Summary
> -------
> 
> As discussed in the JIRA item, I modified FailoverSink to deal with all exceptions.
> Now a sink that fails will be put onto a failed links queue, from which a recovery will be attempted after a timeout. Each sequential failure the timeout will increase. I am open to other methods of increasing the timeout(maybe add on a ceiling?)
> 
> 
> This addresses bug FLUME-1030.
>     https://issues.apache.org/jira/browse/FLUME-1030
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/test/java/org/apache/flume/sink/TestFailoverSinkProcessor.java 195c121 
>   flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java 7eada57 
> 
> Diff: https://reviews.apache.org/r/4445/diff
> 
> 
> Testing
> -------
> 
> Modified the test for the new functionality, new test passes
> 
> No other tests should be affected, but my environment was having some weird problems. I'll look into them tomorrow, just leaving this up so people can have a browse and will confirm tests passing tomorrow
> 
> 
> Thanks,
> 
> Juhani
> 
>


Re: Review Request: FLUME-1030, handle sinks with exceptions by putting them on a cooldown

Posted by Arvind Prabhakar <ar...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4445/#review6265
-----------------------------------------------------------


Thanks for making the changes Juhani. Some feedback follows.


flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java
<https://reviews.apache.org/r/4445/#comment13571>

    The max penalty calculated during configure of the sink processor should be applied here to enforce the ceiling.



flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java
<https://reviews.apache.org/r/4445/#comment13570>

    This seems like a typo. Perhaps you want to do something like
    
    maxPenalty = context.getInteger(CONF_KEY_MAX_PENALTY, DEFAULT_MAX_PENALTY);
    
    



flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java
<https://reviews.apache.org/r/4445/#comment13568>

    It will be good to log this exception so we have a trace of the failures that are happening.



flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java
<https://reviews.apache.org/r/4445/#comment13569>

    Log the exception.


Rest of the changes look good to me.

- Arvind


On 2012-03-23 02:41:03, Juhani Connolly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4445/
> -----------------------------------------------------------
> 
> (Updated 2012-03-23 02:41:03)
> 
> 
> Review request for Flume.
> 
> 
> Summary
> -------
> 
> As discussed in the JIRA item, I modified FailoverSink to deal with all exceptions.
> Now a sink that fails will be put onto a failed links queue, from which a recovery will be attempted after a timeout. Each sequential failure the timeout will increase. I am open to other methods of increasing the timeout(maybe add on a ceiling?)
> 
> 
> This addresses bug FLUME-1030.
>     https://issues.apache.org/jira/browse/FLUME-1030
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/test/java/org/apache/flume/sink/TestFailoverSinkProcessor.java 195c121 
>   flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java 7eada57 
> 
> Diff: https://reviews.apache.org/r/4445/diff
> 
> 
> Testing
> -------
> 
> Modified the test for the new functionality, new test passes
> 
> No other tests should be affected, but my environment was having some weird problems. I'll look into them tomorrow, just leaving this up so people can have a browse and will confirm tests passing tomorrow
> 
> 
> Thanks,
> 
> Juhani
> 
>


Re: Review Request: FLUME-1030, handle sinks with exceptions by putting them on a cooldown

Posted by Arvind Prabhakar <ar...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4445/#review6289
-----------------------------------------------------------

Ship it!


+1

- Arvind


On 2012-03-23 05:28:45, Juhani Connolly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4445/
> -----------------------------------------------------------
> 
> (Updated 2012-03-23 05:28:45)
> 
> 
> Review request for Flume.
> 
> 
> Summary
> -------
> 
> As discussed in the JIRA item, I modified FailoverSink to deal with all exceptions.
> Now a sink that fails will be put onto a failed links queue, from which a recovery will be attempted after a timeout. Each sequential failure the timeout will increase. I am open to other methods of increasing the timeout(maybe add on a ceiling?)
> 
> 
> This addresses bug FLUME-1030.
>     https://issues.apache.org/jira/browse/FLUME-1030
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/test/java/org/apache/flume/sink/TestFailoverSinkProcessor.java 195c121 
>   flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java 7eada57 
> 
> Diff: https://reviews.apache.org/r/4445/diff
> 
> 
> Testing
> -------
> 
> Modified the test for the new functionality, new test passes
> 
> No other tests should be affected, but my environment was having some weird problems. I'll look into them tomorrow, just leaving this up so people can have a browse and will confirm tests passing tomorrow
> 
> 
> Thanks,
> 
> Juhani
> 
>


Re: Review Request: FLUME-1030, handle sinks with exceptions by putting them on a cooldown

Posted by Juhani Connolly <ju...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4445/
-----------------------------------------------------------

(Updated 2012-03-23 05:28:45.159536)


Review request for Flume.


Changes
-------

Fixed suggested changes and also added some javadoc describing functioning and new config setting.


Summary
-------

As discussed in the JIRA item, I modified FailoverSink to deal with all exceptions.
Now a sink that fails will be put onto a failed links queue, from which a recovery will be attempted after a timeout. Each sequential failure the timeout will increase. I am open to other methods of increasing the timeout(maybe add on a ceiling?)


This addresses bug FLUME-1030.
    https://issues.apache.org/jira/browse/FLUME-1030


Diffs (updated)
-----

  flume-ng-core/src/test/java/org/apache/flume/sink/TestFailoverSinkProcessor.java 195c121 
  flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java 7eada57 

Diff: https://reviews.apache.org/r/4445/diff


Testing
-------

Modified the test for the new functionality, new test passes

No other tests should be affected, but my environment was having some weird problems. I'll look into them tomorrow, just leaving this up so people can have a browse and will confirm tests passing tomorrow


Thanks,

Juhani


Re: Review Request: FLUME-1030, handle sinks with exceptions by putting them on a cooldown

Posted by Juhani Connolly <ju...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4445/
-----------------------------------------------------------

(Updated 2012-03-23 02:41:03.472482)


Review request for Flume.


Changes
-------

Wasn't applying the penalty limit, fixed now


Summary
-------

As discussed in the JIRA item, I modified FailoverSink to deal with all exceptions.
Now a sink that fails will be put onto a failed links queue, from which a recovery will be attempted after a timeout. Each sequential failure the timeout will increase. I am open to other methods of increasing the timeout(maybe add on a ceiling?)


This addresses bug FLUME-1030.
    https://issues.apache.org/jira/browse/FLUME-1030


Diffs (updated)
-----

  flume-ng-core/src/test/java/org/apache/flume/sink/TestFailoverSinkProcessor.java 195c121 
  flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java 7eada57 

Diff: https://reviews.apache.org/r/4445/diff


Testing
-------

Modified the test for the new functionality, new test passes

No other tests should be affected, but my environment was having some weird problems. I'll look into them tomorrow, just leaving this up so people can have a browse and will confirm tests passing tomorrow


Thanks,

Juhani


Re: Review Request: FLUME-1030, handle sinks with exceptions by putting them on a cooldown

Posted by Juhani Connolly <ju...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4445/
-----------------------------------------------------------

(Updated 2012-03-23 01:48:59.198037)


Review request for Flume.


Changes
-------

Updated with the suggested changes.

All tests pass


Summary
-------

As discussed in the JIRA item, I modified FailoverSink to deal with all exceptions.
Now a sink that fails will be put onto a failed links queue, from which a recovery will be attempted after a timeout. Each sequential failure the timeout will increase. I am open to other methods of increasing the timeout(maybe add on a ceiling?)


This addresses bug FLUME-1030.
    https://issues.apache.org/jira/browse/FLUME-1030


Diffs (updated)
-----

  flume-ng-core/src/test/java/org/apache/flume/sink/TestFailoverSinkProcessor.java 195c121 
  flume-ng-core/src/main/java/org/apache/flume/sink/FailoverSinkProcessor.java 7eada57 

Diff: https://reviews.apache.org/r/4445/diff


Testing
-------

Modified the test for the new functionality, new test passes

No other tests should be affected, but my environment was having some weird problems. I'll look into them tomorrow, just leaving this up so people can have a browse and will confirm tests passing tomorrow


Thanks,

Juhani