You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flume.apache.org by Paul Chavez <pc...@verticalsearchworks.com> on 2013/03/01 01:00:21 UTC

Take list full error after 1.3 upgrade

I have a 2-tier flume setup, with 4 agents feeding into 2 'collector' agents that write to HDFS.
 
One of the data flows is hung up after an upgrade and restart with the following error:

3:54:13.497 PM	 ERROR	 org.apache.flume.sink.hdfs.HDFSEventSink	 process failed
org.apache.flume.ChannelException: Take list for FileBackedTransaction, capacity 1000 full, consider committing more frequently, increasing capacity, or increasing thread count. [channel=fc_WebLogs]
	at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doTake(FileChannel.java:481)
	at org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)
	at org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)
	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:386)
	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
	at java.lang.Thread.run(Thread.java:662)

	
3:54:13.498 PM	 ERROR	 org.apache.flume.SinkRunner	 Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: org.apache.flume.ChannelException: Take list for FileBackedTransaction, capacity 1000 full, consider committing more frequently, increasing capacity, or increasing thread count. [channel=fc_WebLogs]
	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:461)
	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
	at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.flume.ChannelException: Take list for FileBackedTransaction, capacity 1000 full, consider committing more frequently, increasing capacity, or increasing thread count. [channel=fc_WebLogs]
	at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doTake(FileChannel.java:481)
	at org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)
	at org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)
	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:386)
	... 3 more
	
The relevant part of the config is here:
tier2.sinks.hdfs_WebLogs.type = hdfs
tier2.sinks.hdfs_WebLogs.channel = fc_WebLogs
tier2.sinks.hdfs_WebLogs.hdfs.path = /flume/WebLogs/%Y%m%d/%H%M
tier2.sinks.hdfs_WebLogs.hdfs.round = true
tier2.sinks.hdfs_WebLogs.hdfs.roundValue = 15
tier2.sinks.hdfs_WebLogs.hdfs.roundUnit = minute
tier2.sinks.hdfs_WebLogs.hdfs.rollSize = 67108864
tier2.sinks.hdfs_WebLogs.hdfs.rollCount = 0
tier2.sinks.hdfs_WebLogs.hdfs.rollInterval = 30
tier2.sinks.hdfs_WebLogs.hdfs.batchSize = 10000
tier2.sinks.hdfs_WebLogs.hdfs.fileType = DataStream
tier2.sinks.hdfs_WebLogs.hdfs.writeFormat = Text

The channel is full, and the metrics page shows many take attempts with no successes. I've been in situations before where the channel is full (usually due to lease issues on HDFS files) but never had this issue, usually just an agent restart gets it going again.

Any help appreciated..

Thanks,
Paul Chavez

RE: Take list full error after 1.3 upgrade

Posted by Paul Chavez <pc...@verticalsearchworks.com>.

Did the default channel transaction change from 1.2 to 1.3? It used to be 1 million events default, and still looks like it according to metrics:
 
CHANNEL.fc_WebLogs:  
{ 

*		EventPutSuccessCount: "0",
*		ChannelFillPercentage: "99.994",
*		Type: "CHANNEL",
*		StopTime: "0",
*		EventPutAttemptCount: "0",
*		ChannelSize: "999940",
*		StartTime: "1362096361779",
*		EventTakeSuccessCount: "0",
*		ChannelCapacity: "1000000",
*		EventTakeAttemptCount: "22022"


________________________________

From: Hari Shreedharan [mailto:hshreedharan@cloudera.com] 
Sent: Thursday, February 28, 2013 4:07 PM
To: user@flume.apache.org
Subject: Re: Take list full error after 1.3 upgrade


You need to increase the transactionCapacity of the channel to at least the batchSize of the HDFS sink. In your case, it is 1000 for the channel transaction capacity and your hdfs batch size is 10000. 

-- 
Hari Shreedharan


On Thursday, February 28, 2013 at 4:00 PM, Paul Chavez wrote:

		I have a 2-tier flume setup, with 4 agents feeding into 2 'collector' agents that write to HDFS.
	One of the data flows is hung up after an upgrade and restart with the following error:

	3:54:13.497 PM ERROR org.apache.flume.sink.hdfs.HDFSEventSink process failed
	org.apache.flume.ChannelException: Take list for FileBackedTransaction, capacity 1000 full, consider committing more frequently, increasing capacity, or increasing thread count. [channel=fc_WebLogs]
	at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doTake(FileChannel.java:481)
	at org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)
	at org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)
	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:386)
	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
	at java.lang.Thread.run(Thread.java:662)

	3:54:13.498 PM ERROR org.apache.flume.SinkRunner Unable to deliver event. Exception follows.
	org.apache.flume.EventDeliveryException: org.apache.flume.ChannelException: Take list for FileBackedTransaction, capacity 1000 full, consider committing more frequently, increasing capacity, or increasing thread count. [channel=fc_WebLogs]
	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:461)
	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
	at java.lang.Thread.run(Thread.java:662)
	Caused by: org.apache.flume.ChannelException: Take list for FileBackedTransaction, capacity 1000 full, consider committing more frequently, increasing capacity, or increasing thread count. [channel=fc_WebLogs]
	at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doTake(FileChannel.java:481)
	at org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)
	at org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)
	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:386)
	... 3 more
	The relevant part of the config is here:
	tier2.sinks.hdfs_WebLogs.type = hdfs
	tier2.sinks.hdfs_WebLogs.channel = fc_WebLogs
	tier2.sinks.hdfs_WebLogs.hdfs.path = /flume/WebLogs/%Y%m%d/%H%M
	tier2.sinks.hdfs_WebLogs.hdfs.round = true
	tier2.sinks.hdfs_WebLogs.hdfs.roundValue = 15
	tier2.sinks.hdfs_WebLogs.hdfs.roundUnit = minute
	tier2.sinks.hdfs_WebLogs.hdfs.rollSize = 67108864
	tier2.sinks.hdfs_WebLogs.hdfs.rollCount = 0
	tier2.sinks.hdfs_WebLogs.hdfs.rollInterval = 30
	tier2.sinks.hdfs_WebLogs.hdfs.batchSize = 10000
	tier2.sinks.hdfs_WebLogs.hdfs.fileType = DataStream
	tier2.sinks.hdfs_WebLogs.hdfs.writeFormat = Text

	The channel is full, and the metrics page shows many take attempts with no successes. I've been in situations before where the channel is full (usually due to lease issues on HDFS files) but never had this issue, usually just an agent restart gets it going again.

	Any help appreciated..

	Thanks,
	Paul Chavez

RE: Take list full error after 1.3 upgrade

Posted by Paul Chavez <pc...@verticalsearchworks.com>.

Oh I see the error. You said transaction capacity. It is defaulting to 1000, I have never configured it before, just relied on defaults. Configuring it to 10000 worked.

Thank you,
Paul Chavez
 

-----Original Message-----
From: Paul Chavez 
Sent: Thursday, February 28, 2013 4:11 PM
To: 'user@flume.apache.org'
Subject: RE: Take list full error after 1.3 upgrade

Did the default channel transaction change from 1.2 to 1.3? It used to be 1 million events default, and still looks like it according to metrics:
 
CHANNEL.fc_WebLogs:  
{ 

*		EventPutSuccessCount: "0",
*		ChannelFillPercentage: "99.994",
*		Type: "CHANNEL",
*		StopTime: "0",
*		EventPutAttemptCount: "0",
*		ChannelSize: "999940",
*		StartTime: "1362096361779",
*		EventTakeSuccessCount: "0",
*		ChannelCapacity: "1000000",
*		EventTakeAttemptCount: "22022"


________________________________

From: Hari Shreedharan [mailto:hshreedharan@cloudera.com]
Sent: Thursday, February 28, 2013 4:07 PM
To: user@flume.apache.org
Subject: Re: Take list full error after 1.3 upgrade


You need to increase the transactionCapacity of the channel to at least the batchSize of the HDFS sink. In your case, it is 1000 for the channel transaction capacity and your hdfs batch size is 10000. 

-- 
Hari Shreedharan


On Thursday, February 28, 2013 at 4:00 PM, Paul Chavez wrote:

		I have a 2-tier flume setup, with 4 agents feeding into 2 'collector' agents that write to HDFS.
	One of the data flows is hung up after an upgrade and restart with the following error:

	3:54:13.497 PM ERROR org.apache.flume.sink.hdfs.HDFSEventSink process failed
	org.apache.flume.ChannelException: Take list for FileBackedTransaction, capacity 1000 full, consider committing more frequently, increasing capacity, or increasing thread count. [channel=fc_WebLogs]
	at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doTake(FileChannel.java:481)
	at org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)
	at org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)
	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:386)
	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
	at java.lang.Thread.run(Thread.java:662)

	3:54:13.498 PM ERROR org.apache.flume.SinkRunner Unable to deliver event. Exception follows.
	org.apache.flume.EventDeliveryException: org.apache.flume.ChannelException: Take list for FileBackedTransaction, capacity 1000 full, consider committing more frequently, increasing capacity, or increasing thread count. [channel=fc_WebLogs]
	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:461)
	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
	at java.lang.Thread.run(Thread.java:662)
	Caused by: org.apache.flume.ChannelException: Take list for FileBackedTransaction, capacity 1000 full, consider committing more frequently, increasing capacity, or increasing thread count. [channel=fc_WebLogs]
	at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doTake(FileChannel.java:481)
	at org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)
	at org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)
	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:386)
	... 3 more
	The relevant part of the config is here:
	tier2.sinks.hdfs_WebLogs.type = hdfs
	tier2.sinks.hdfs_WebLogs.channel = fc_WebLogs
	tier2.sinks.hdfs_WebLogs.hdfs.path = /flume/WebLogs/%Y%m%d/%H%M
	tier2.sinks.hdfs_WebLogs.hdfs.round = true
	tier2.sinks.hdfs_WebLogs.hdfs.roundValue = 15
	tier2.sinks.hdfs_WebLogs.hdfs.roundUnit = minute
	tier2.sinks.hdfs_WebLogs.hdfs.rollSize = 67108864
	tier2.sinks.hdfs_WebLogs.hdfs.rollCount = 0
	tier2.sinks.hdfs_WebLogs.hdfs.rollInterval = 30
	tier2.sinks.hdfs_WebLogs.hdfs.batchSize = 10000
	tier2.sinks.hdfs_WebLogs.hdfs.fileType = DataStream
	tier2.sinks.hdfs_WebLogs.hdfs.writeFormat = Text

	The channel is full, and the metrics page shows many take attempts with no successes. I've been in situations before where the channel is full (usually due to lease issues on HDFS files) but never had this issue, usually just an agent restart gets it going again.

	Any help appreciated..

	Thanks,
	Paul Chavez

Re: Take list full error after 1.3 upgrade

Posted by Hari Shreedharan <hs...@cloudera.com>.

You need to increase the transactionCapacity of the channel to at least the batchSize of the HDFS sink. In your case, it is 1000 for the channel transaction capacity and your hdfs batch size is 10000. 

-- 
Hari Shreedharan


On Thursday, February 28, 2013 at 4:00 PM, Paul Chavez wrote:

> I have a 2-tier flume setup, with 4 agents feeding into 2 'collector' agents that write to HDFS.
> 
> One of the data flows is hung up after an upgrade and restart with the following error:
> 
> 3:54:13.497 PM ERROR org.apache.flume.sink.hdfs.HDFSEventSink process failed
> org.apache.flume.ChannelException: Take list for FileBackedTransaction, capacity 1000 full, consider committing more frequently, increasing capacity, or increasing thread count. [channel=fc_WebLogs]
> at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doTake(FileChannel.java:481)
> at org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)
> at org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)
> at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:386)
> at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:662)
> 
> 3:54:13.498 PM ERROR org.apache.flume.SinkRunner Unable to deliver event. Exception follows.
> org.apache.flume.EventDeliveryException: org.apache.flume.ChannelException: Take list for FileBackedTransaction, capacity 1000 full, consider committing more frequently, increasing capacity, or increasing thread count. [channel=fc_WebLogs]
> at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:461)
> at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.flume.ChannelException: Take list for FileBackedTransaction, capacity 1000 full, consider committing more frequently, increasing capacity, or increasing thread count. [channel=fc_WebLogs]
> at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doTake(FileChannel.java:481)
> at org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)
> at org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)
> at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:386)
> ... 3 more
> 
> The relevant part of the config is here:
> tier2.sinks.hdfs_WebLogs.type = hdfs
> tier2.sinks.hdfs_WebLogs.channel = fc_WebLogs
> tier2.sinks.hdfs_WebLogs.hdfs.path = /flume/WebLogs/%Y%m%d/%H%M
> tier2.sinks.hdfs_WebLogs.hdfs.round = true
> tier2.sinks.hdfs_WebLogs.hdfs.roundValue = 15
> tier2.sinks.hdfs_WebLogs.hdfs.roundUnit = minute
> tier2.sinks.hdfs_WebLogs.hdfs.rollSize = 67108864
> tier2.sinks.hdfs_WebLogs.hdfs.rollCount = 0
> tier2.sinks.hdfs_WebLogs.hdfs.rollInterval = 30
> tier2.sinks.hdfs_WebLogs.hdfs.batchSize = 10000
> tier2.sinks.hdfs_WebLogs.hdfs.fileType = DataStream
> tier2.sinks.hdfs_WebLogs.hdfs.writeFormat = Text
> 
> The channel is full, and the metrics page shows many take attempts with no successes. I've been in situations before where the channel is full (usually due to lease issues on HDFS files) but never had this issue, usually just an agent restart gets it going again.
> 
> Any help appreciated..
> 
> Thanks,
> Paul Chavez
> 
>