You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by terreyshih <te...@gmail.com> on 2015/01/13 08:28:06 UTC

runner.backoffs and runner.backoffs.consecutive ? related to ChannelFullException ?

Hi,

What do the values runner.backoffs.consecutive and runner.backoffs value mean ? I get ChannelFullException shortly afterwards.

thanks,
-Terrey



2015-01-10 02:05:14,500 DEBUG [lifecycleSupervisor-1-1] [o.a.f.l.LifecycleSupervisor.run:214] - checking process:SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@63b8dba5 counterGroup:{ name:null counters:{runner.backoffs.consecutive=8, runner.backoffs=2120} } } supervisoree:{ status:{ lastSeen:1420884311499 lastSeenState:START desiredState:START firstSeen:1420869500254 failures:0 discard:false error:false } policy:org.apache.flume.lifecycle.LifecycleSupervisor$SupervisorPolicy$AlwaysRestartPolicy@2e60639b }



Caused by: org.apache.flume.ChannelFullException: Space for commit to queue couldn't be acquired. Sinks are likely not keeping up with sources, or the buffer size is too tight

Re: runner.backoffs and runner.backoffs.consecutive ? related to ChannelFullException ?

Posted by terreyshih <te...@gmail.com>.
thanks Jeff.

We don’t have load balancer or failover sinks, but your explanation might be the issue here too.  In other words, there are some issues with my downstream sink and it keeps waiting and buffering before trying again. When it reaches the the max, and the sink is not processing, we would get the channelful exception. Is my understanding sort of correct?

thx

> On Jan 13, 2015, at 10:55 AM, <j....@accenture.com> <j....@accenture.com> wrote:
> 
> Hi,
>  
> As far as I can understand, this mechanism is involved in the SinkRunner, which is basically a Thread in charge of processing the event to be sent by the sinks.
>  
> It is an infinite loop, with a backoff mechanism in order to avoid trying too long to use a deficient sink.
> If backoff is enabled then the client will temporarily blacklist hosts that fail, causing them to be excluded from being selected as a failover host until a given timeout. When the timeout elapses, if the host is still unresponsive then this is considered a sequential failure, and the timeout is increased exponentially to avoid potentially getting stuck in long waits on unresponsive hosts.
> 
> The maximum backoff time can be configured by setting maxBackoff (in milliseconds). The maxBackoff default is 30 seconds (specified in the OrderSelector class that’s the superclass of both load balancing strategies). The backoff timeout will increase exponentially with each sequential failure up to the maximum possible backoff timeout. The maximum possible backoff is limited to 65536 seconds (about 18.2 hours).
> 
> A focus on the code explains a bit more:
>  
>     public void run() {
>       logger.debug("Polling sink runner starting");
>  
>       while (!shouldStop.get()) {
>         try {
>           if (policy.process().equals(Sink.Status.BACKOFF)) {
>             counterGroup.incrementAndGet("runner.backoffs");
>  
>             Thread.sleep(Math.min(
>                 counterGroup.incrementAndGet("runner.backoffs.consecutive")
>                 * backoffSleepIncrement, maxBackoffSleep));
>           } else {
>             counterGroup.set("runner.backoffs.consecutive", 0L);
>           }
>         } catch (InterruptedException e) {
>           logger.debug("Interrupted while processing an event. Exiting.");
>           counterGroup.incrementAndGet("runner.interruptions");
>         } catch (Exception e) {
>           logger.error("Unable to deliver event. Exception follows.", e);
>           if (e instanceof EventDeliveryException) {
>             counterGroup.incrementAndGet("runner.deliveryErrors");
>           } else {
>             counterGroup.incrementAndGet("runner.errors");
>           }
>           try {
>             Thread.sleep(maxBackoffSleep);
>           } catch (InterruptedException ex) {
>             Thread.currentThread().interrupt();
>           }
>         }
>       }
>       logger.debug("Polling runner exiting. Metrics:{}", counterGroup);
>     }
>  
>   }
>  
> ð  Apparently your sink is existing with BACKOFF status, and therefore Flume wait more and more (until max) before retrying, hoping the sink will recover.
>  
> Jeff
>  
> From: terreyshih [mailto:terreyshih@gmail.com] 
> Sent: mardi 13 janvier 2015 19:20
> To: user@flume.apache.org
> Subject: runner.backoffs and runner.backoffs.consecutive ? related to ChannelFullException ?
>  
> anybody and shed some light on this ? thx
>  
> On Jan 12, 2015, at 11:28 PM, terreyshih <terreyshih@gmail.com <ma...@gmail.com>> wrote:
>  
> Hi,
>  
> What do the values runner.backoffs.consecutive and runner.backoffs value mean ? I get ChannelFullException shortly afterwards.
>  
> thanks,
> -Terrey
>  
>  
>  
> 2015-01-10 02:05:14,500 DEBUG [lifecycleSupervisor-1-1] [o.a.f.l.LifecycleSupervisor.run:214] - checking process:SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@63b8dba5 counterGroup:{ name:null counters:{runner.backoffs.consecutive=8, runner.backoffs=2120} } } supervisoree:{ status:{ lastSeen:1420884311499 lastSeenState:START desiredState:START firstSeen:1420869500254 failures:0 discard:false error:false } policy:org.apache.flume.lifecycle.LifecycleSupervisor$SupervisorPolicy$AlwaysRestartPolicy@2e60639b }
>  
>  
>  
> Caused by: org.apache.flume.ChannelFullException: Space for commit to queue couldn't be acquired. Sinks are likely not keeping up with sources, or the buffer size is too tight
>  
> 
> 
> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. 
> ______________________________________________________________________________________
> 
> www.accenture.com <http://www.accenture.com/>

RE: runner.backoffs and runner.backoffs.consecutive ? related to ChannelFullException ?

Posted by j....@accenture.com.
Hi,

As far as I can understand, this mechanism is involved in the SinkRunner, which is basically a Thread in charge of processing the event to be sent by the sinks.

It is an infinite loop, with a backoff mechanism in order to avoid trying too long to use a deficient sink.

If backoff is enabled then the client will temporarily blacklist hosts that fail, causing them to be excluded from being selected as a failover host until a given timeout. When the timeout elapses, if the host is still unresponsive then this is considered a sequential failure, and the timeout is increased exponentially to avoid potentially getting stuck in long waits on unresponsive hosts.

The maximum backoff time can be configured by setting maxBackoff (in milliseconds). The maxBackoff default is 30 seconds (specified in the OrderSelector class that's the superclass of both load balancing strategies). The backoff timeout will increase exponentially with each sequential failure up to the maximum possible backoff timeout. The maximum possible backoff is limited to 65536 seconds (about 18.2 hours).
A focus on the code explains a bit more:

    public void run() {
      logger.debug("Polling sink runner starting");

      while (!shouldStop.get()) {
        try {
          if (policy.process().equals(Sink.Status.BACKOFF)) {
            counterGroup.incrementAndGet("runner.backoffs");

            Thread.sleep(Math.min(
                counterGroup.incrementAndGet("runner.backoffs.consecutive")
                * backoffSleepIncrement, maxBackoffSleep));
          } else {
            counterGroup.set("runner.backoffs.consecutive", 0L);
          }
        } catch (InterruptedException e) {
          logger.debug("Interrupted while processing an event. Exiting.");
          counterGroup.incrementAndGet("runner.interruptions");
        } catch (Exception e) {
          logger.error("Unable to deliver event. Exception follows.", e);
          if (e instanceof EventDeliveryException) {
            counterGroup.incrementAndGet("runner.deliveryErrors");
          } else {
            counterGroup.incrementAndGet("runner.errors");
          }
          try {
            Thread.sleep(maxBackoffSleep);
          } catch (InterruptedException ex) {
            Thread.currentThread().interrupt();
          }
        }
      }
      logger.debug("Polling runner exiting. Metrics:{}", counterGroup);
    }

  }


ð  Apparently your sink is existing with BACKOFF status, and therefore Flume wait more and more (until max) before retrying, hoping the sink will recover.

Jeff

From: terreyshih [mailto:terreyshih@gmail.com]
Sent: mardi 13 janvier 2015 19:20
To: user@flume.apache.org
Subject: runner.backoffs and runner.backoffs.consecutive ? related to ChannelFullException ?

anybody and shed some light on this ? thx

On Jan 12, 2015, at 11:28 PM, terreyshih <te...@gmail.com>> wrote:

Hi,

What do the values runner.backoffs.consecutive and runner.backoffs value mean ? I get ChannelFullException shortly afterwards.

thanks,
-Terrey



2015-01-10 02:05:14,500 DEBUG [lifecycleSupervisor-1-1] [o.a.f.l.LifecycleSupervisor.run:214] - checking process:SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@63b8dba5 counterGroup:{ name:null counters:{runner.backoffs.consecutive=8, runner.backoffs=2120} } } supervisoree:{ status:{ lastSeen:1420884311499 lastSeenState:START desiredState:START firstSeen:1420869500254 failures:0 discard:false error:false } policy:org.apache.flume.lifecycle.LifecycleSupervisor$SupervisorPolicy$AlwaysRestartPolicy@2e60639b }



Caused by: org.apache.flume.ChannelFullException: Space for commit to queue couldn't be acquired. Sinks are likely not keeping up with sources, or the buffer size is too tight


________________________________

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________

www.accenture.com

runner.backoffs and runner.backoffs.consecutive ? related to ChannelFullException ?

Posted by terreyshih <te...@gmail.com>.
anybody and shed some light on this ? thx

> On Jan 12, 2015, at 11:28 PM, terreyshih <te...@gmail.com> wrote:
> 
> Hi,
> 
> What do the values runner.backoffs.consecutive and runner.backoffs value mean ? I get ChannelFullException shortly afterwards.
> 
> thanks,
> -Terrey
> 
> 
> 
> 2015-01-10 02:05:14,500 DEBUG [lifecycleSupervisor-1-1] [o.a.f.l.LifecycleSupervisor.run:214] - checking process:SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@63b8dba5 counterGroup:{ name:null counters:{runner.backoffs.consecutive=8, runner.backoffs=2120} } } supervisoree:{ status:{ lastSeen:1420884311499 lastSeenState:START desiredState:START firstSeen:1420869500254 failures:0 discard:false error:false } policy:org.apache.flume.lifecycle.LifecycleSupervisor$SupervisorPolicy$AlwaysRestartPolicy@2e60639b }
> 
> 
> 
> Caused by: org.apache.flume.ChannelFullException: Space for commit to queue couldn't be acquired. Sinks are likely not keeping up with sources, or the buffer size is too tight