You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ambari.apache.org by cs user <ac...@gmail.com> on 2016/05/17 11:44:39 UTC

Flume - always unable to stop 2 flume agents

Hello,

We have 12 flume agents. Whenever we change the config and need to restart
the affected nodes, we always end up with 2 flume agents which refuse to
stop, it takes multiple attempts (sometimes this takes as long as 45 mins)
to eventually stop the agents. You have to keep trying to restart them.

Has anyone else seen this? Is there a work around?

Thanks!

Re: Flume - always unable to stop 2 flume agents

Posted by Srimanth Gunturi <sg...@hortonworks.com>.
Hello,

If the Flume agents are receiving the shutdown request and they are not doing so, I would suggest discussing this on the Flume mailing lists at https://flume.apache.org/mailinglists.html?

Regards,

Srimanth




________________________________
From: cs user <ac...@gmail.com>
Sent: Wednesday, May 18, 2016 12:54 AM
To: user@ambari.apache.org
Subject: Re: Flume - always unable to stop 2 flume agents

Hi Srimanth,

Thanks for responding. I've checked the logs and it seems that the shutdown event is received, and it is closed for some channels (we have 3 channels) but it just continues to run. For example I can see entries like:

18 May 2016 08:13:21,142 INFO  [agent-shutdown-hook] (com.aweber.flume.source.rabbitmq.RabbitMQSource.stop:117)  - Stopping channel1-source
18 May 2016 08:13:21,142 INFO  [agent-shutdown-hook] (org.apache.flume.instrumentation.MonitoredCounterGroup.stop:149)  - Component type: SOURCE, name: channel1-source stopped

But it looks like it continues to process events. I can see entries like this repeated over and over, you can see this is around 30 mins after it tried to stop:

18 May 2016 08:47:06,778 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.run:143)  - Attributes for component SOURCE.channel1-source
18 May 2016 08:47:06,778 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)  - EventReceivedCount = 36417
18 May 2016 08:47:06,778 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)  - AppendBatchAcceptedCount = 0
18 May 2016 08:47:06,778 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)  - EventAcceptedCount = 36417
18 May 2016 08:47:06,778 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)  - AppendReceivedCount = 0
18 May 2016 08:47:06,778 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)  - StartTime = 1463486595420
18 May 2016 08:47:06,778 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)  - AppendAcceptedCount = 0
18 May 2016 08:47:06,779 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)  - OpenConnectionCount = 2
18 May 2016 08:47:06,779 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)  - AppendBatchReceivedCount = 0
18 May 2016 08:47:06,779 INFO  [pool-57-thread-1] (org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)  - StopTime = 1463555601142

Is this normal behavior?

We are using this plugin:

https://github.com/aweber/rabbitmq-flume-plugin

I have thought about switching to this plugin:

https://github.com/jcustenborder/flume-ng-rabbitmq

To see if the problem goes away.

Thanks!






On Tue, May 17, 2016 at 5:29 PM, Srimanth Gunturi <sg...@hortonworks.com>> wrote:

?Hello,

Could you please describe the setup a little bit more? Are 12 flume agents on 12 different hosts or on a single host?

Also, have you looked at the flume logs for the those 2 agents to determine what is going on during the 45 minutes?

Regards,

Srimanth


________________________________
From: cs user <ac...@gmail.com>>
Sent: Tuesday, May 17, 2016 4:44 AM
To: user@ambari.apache.org<ma...@ambari.apache.org>
Subject: Flume - always unable to stop 2 flume agents

Hello,

We have 12 flume agents. Whenever we change the config and need to restart the affected nodes, we always end up with 2 flume agents which refuse to stop, it takes multiple attempts (sometimes this takes as long as 45 mins) to eventually stop the agents. You have to keep trying to restart them.

Has anyone else seen this? Is there a work around?

Thanks!


Re: Flume - always unable to stop 2 flume agents

Posted by cs user <ac...@gmail.com>.
Hi Srimanth,

Thanks for responding. I've checked the logs and it seems that the shutdown
event is received, and it is closed for some channels (we have 3 channels)
but it just continues to run. For example I can see entries like:

18 May 2016 08:13:21,142 INFO  [agent-shutdown-hook]
(com.aweber.flume.source.rabbitmq.RabbitMQSource.stop:117)  - Stopping
channel1-source
18 May 2016 08:13:21,142 INFO  [agent-shutdown-hook]
(org.apache.flume.instrumentation.MonitoredCounterGroup.stop:149)  -
Component type: SOURCE, name: channel1-source stopped

But it looks like it continues to process events. I can see entries like
this repeated over and over, you can see this is around 30 mins after it
tried to stop:

18 May 2016 08:47:06,778 INFO  [pool-57-thread-1]
(org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.run:143)
 - Attributes for component SOURCE.channel1-source
18 May 2016 08:47:06,778 INFO  [pool-57-thread-1]
(org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)
 - EventReceivedCount = 36417
18 May 2016 08:47:06,778 INFO  [pool-57-thread-1]
(org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)
 - AppendBatchAcceptedCount = 0
18 May 2016 08:47:06,778 INFO  [pool-57-thread-1]
(org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)
 - EventAcceptedCount = 36417
18 May 2016 08:47:06,778 INFO  [pool-57-thread-1]
(org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)
 - AppendReceivedCount = 0
18 May 2016 08:47:06,778 INFO  [pool-57-thread-1]
(org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)
 - StartTime = 1463486595420
18 May 2016 08:47:06,778 INFO  [pool-57-thread-1]
(org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)
 - AppendAcceptedCount = 0
18 May 2016 08:47:06,779 INFO  [pool-57-thread-1]
(org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)
 - OpenConnectionCount = 2
18 May 2016 08:47:06,779 INFO  [pool-57-thread-1]
(org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)
 - AppendBatchReceivedCount = 0
18 May 2016 08:47:06,779 INFO  [pool-57-thread-1]
(org.apache.hadoop.metrics2.sink.flume.FlumeTimelineMetricsSink$TimelineMetricsCollector.processComponentAttributes:163)
 - StopTime = 1463555601142

Is this normal behavior?

We are using this plugin:

https://github.com/aweber/rabbitmq-flume-plugin

I have thought about switching to this plugin:

https://github.com/jcustenborder/flume-ng-rabbitmq

To see if the problem goes away.

Thanks!






On Tue, May 17, 2016 at 5:29 PM, Srimanth Gunturi <sg...@hortonworks.com>
wrote:

> ​Hello,
>
> Could you please describe the setup a little bit more? Are 12 flume agents
> on 12 different hosts or on a single host?
>
> Also, have you looked at the flume logs for the those 2 agents to
> determine what is going on during the 45 minutes?
>
> Regards,
>
> Srimanth
>
>
> ------------------------------
> *From:* cs user <ac...@gmail.com>
> *Sent:* Tuesday, May 17, 2016 4:44 AM
> *To:* user@ambari.apache.org
> *Subject:* Flume - always unable to stop 2 flume agents
>
> Hello,
>
> We have 12 flume agents. Whenever we change the config and need to restart
> the affected nodes, we always end up with 2 flume agents which refuse to
> stop, it takes multiple attempts (sometimes this takes as long as 45 mins)
> to eventually stop the agents. You have to keep trying to restart them.
>
> Has anyone else seen this? Is there a work around?
>
> Thanks!
>

Re: Flume - always unable to stop 2 flume agents

Posted by Srimanth Gunturi <sg...@hortonworks.com>.
?Hello,

Could you please describe the setup a little bit more? Are 12 flume agents on 12 different hosts or on a single host?

Also, have you looked at the flume logs for the those 2 agents to determine what is going on during the 45 minutes?

Regards,

Srimanth


________________________________
From: cs user <ac...@gmail.com>
Sent: Tuesday, May 17, 2016 4:44 AM
To: user@ambari.apache.org
Subject: Flume - always unable to stop 2 flume agents

Hello,

We have 12 flume agents. Whenever we change the config and need to restart the affected nodes, we always end up with 2 flume agents which refuse to stop, it takes multiple attempts (sometimes this takes as long as 45 mins) to eventually stop the agents. You have to keep trying to restart them.

Has anyone else seen this? Is there a work around?

Thanks!