You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@chukwa.apache.org by Logan Hardy <lo...@33across.com> on 2012/11/10 23:17:50 UTC

WaitingQueue - MemLimitQueue is full

We are running CentOS 5.4, Chukwa 0.3.0, java version "1.6.0_17", and are
feeding a steady stream of data into our CDH3u3 Hadoop cluster. We have 6
Chukwa agent machines feeding 3 Chukwa collectors. Any time the cluster
gets busy with a big job or the task of decommissioning a node the Chukwa
agent and collector start to back up and and I start seeing "WaitingQueue -
MemLimitQueue is full" messages in the agent.log as shown below. As soon as
hadoop cluster activity dies down the MemLimitQueue messages go away and
everything goes back to normal.

[root@COLL5 chukwa]# ps auxf | grep chukwa
root     11258  0.0  0.0  61172   732 pts/0    S+   15:15   0:00
 \_ grep chukwa
root     29248  1.2  2.1 415572 86928 ?        Sl   04:03   8:04
/usr/java/default/bin/java -Xms32M -Xmx64M -DAPP=agent
-Dlog4j.configuration=chukwa-log4j.properties
-DCHUKWA_HOME=/usr/local/chukwa/bin/..
-DCHUKWA_CONF_DIR=/usr/local/chukwa/bin/../conf
-DCHUKWA_LOG_DIR=/usr/local/chukwa/logs -classpath
/usr/local/chukwa/bin/../conf::/usr/local/chukwa/bin/../chukwa-agent-0.3.0.jar:/usr/local/chukwa/bin/../chukwa-core-0.3.0.jar:/usr/local/chukwa/bin/../hadoopjars/hadoop-0.20.0-core.jar:/usr/local/chukwa/bin/../lib/NagiosAppender-1.5.0.jar:/usr/local/chukwa/bin/../lib/ant-1.7.1.jar:/usr/local/chukwa/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/chukwa/bin/../lib/asm-3.1.jar:/usr/local/chukwa/bin/../lib/commons-beanutils-1.8.0.jar:/usr/local/chukwa/bin/../lib/commons-cli-2.0-SNAPSHOT.jar:/usr/local/chukwa/bin/../lib/commons-codec-1.3.jar:/usr/local/chukwa/bin/../lib/commons-collections-3.1.jar:/usr/local/chukwa/bin/../lib/commons-fileupload-1.2.jar:/usr/local/chukwa/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/chukwa/bin/../lib/commons-io-1.4.jar:/usr/local/chukwa/bin/../lib/commons-lang-2.4.jar:/usr/local/chukwa/bin/../lib/commons-logging-1.1.1.jar:/usr/local/chukwa/bin/../lib/commons-logging-api-1.0.4.jar:/usr/local/chukwa/bin/../lib/commons-net-1.4.1.jar:/usr/local/chukwa/bin/../lib/core-3.1.1.jar:/usr/local/chukwa/bin/../lib/ezmorph-1.0.6.jar:/usr/local/chukwa/bin/../lib/jchronic-0.2.3.jar:/usr/local/chukwa/bin/../lib/jersey-bundle-1.1.0-ea.jar:/usr/local/chukwa/bin/../lib/jetty-6.1.11.jar:/usr/local/chukwa/bin/../lib/jetty-util-6.1.11.jar:/usr/local/chukwa/bin/../lib/json-lib-2.2.3-jdk15.jar:/usr/local/chukwa/bin/../lib/json.jar:/usr/local/chukwa/bin/../lib/jsp-2.1-6.1.11.jar:/usr/local/chukwa/bin/../lib/jsp-api-2.1-6.1.11.jar:/usr/local/chukwa/bin/../lib/jsr311-api-1.0.jar:/usr/local/chukwa/bin/../lib/junit-3.8.1.jar:/usr/local/chukwa/bin/../lib/log4j-1.2.13.jar:/usr/local/chukwa/bin/../lib/mysql-connector-java-5.1.6.jar:/usr/local/chukwa/bin/../lib/prefuse.jar:/usr/local/chukwa/bin/../lib/servlet-api-2.5-6.1.11.jar
org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent


agent.log
........
2012-11-10 14:56:14,470 INFO Timer-0 ChukwaAgent - writing checkpoint 7257
2012-11-10 14:56:18,655 INFO Timer-1 HttpConnector - # http chunks ACK'ed
since last report: 547
2012-11-10 14:56:20,163 INFO HTTP post thread ChukwaHttpSender - >>>>>>
HTTP Got success back from http://10.5.200.204:8080/chukwa; response length
832
2012-11-10 14:56:20,163 INFO HTTP post thread HttpConnector - sent 13
chunks, got back 13 acks
2012-11-10 14:56:20,163 INFO HTTP post thread ChukwaHttpSender - collected
13 chunks
*2012-11-10 14:56:20,163 INFO Thread-6 WaitingQueue - MemLimitQueue is full
[8119214]*
2012-11-10 14:56:20,166 INFO HTTP post thread ChukwaHttpSender - >>>>>>
HTTP post to http://10.5.200.204:8080/ length = 2286662
2012-11-10 14:56:24,474 INFO Timer-0 ChukwaAgent - writing checkpoint 7258
2012-11-10 14:56:27,293 INFO HTTP post thread ChukwaHttpSender - >>>>>>
HTTP Got success back from http://10.5.200.204:8080/chukwa; response length
832
2012-11-10 14:56:27,294 INFO HTTP post thread HttpConnector - sent 13
chunks, got back 13 acks
2012-11-10 14:56:27,294 INFO HTTP post thread ChukwaHttpSender - collected
13 chunks
*2012-11-10 14:56:27,295 INFO Thread-6 WaitingQueue - MemLimitQueue is full
[8091188]*
2012-11-10 14:56:27,302 INFO HTTP post thread ChukwaHttpSender - >>>>>>
HTTP post to http://10.5.200.204:8080/ length = 2214008
2012-11-10 14:56:29,476 INFO Timer-0 ChukwaAgent - writing checkpoint 7259


Any ideas?

-- 
-- 
*Logan Hardy *| Operations Engineer
33Across <http://www.33across.com/> | Follow us:
Twitter<http://www.twitter.com/33across>
 | Facebook <http://www.facebook.com/33across>

o 801.231.4573

*Learn about our Q1 Brand Graph Category Insights
Report<http://www.33across.com/BrandGraph/33Across_BrandGraph_AQ1_2012.pdf>
*
*
33Across and Tynt in the News
*AdWeek • AllThingsD • Bloomberg • Forbes • TechCrunch • VentureBeat •
WSJ<http://33across.com/news.php#axzz1uqxl0v16>

Re: WaitingQueue - MemLimitQueue is full

Posted by Logan Hardy <lo...@33across.com>.
Eric,
  Thanks for your ideas on this. I've actually traced this issue to a
single saturated link in our datacenter. But you've given me some ideas on
how I can optimize this system some more. Thanks.

Logan

On Sun, Nov 11, 2012 at 12:00 AM, Eric Yang <er...@gmail.com> wrote:

> Hi Logan,
>
> It looks like the datanode is saturated when large mapreduce job is in
> process.  Chukwa agent will drop data on the floor, if there is more data
> that agent can be buffer in memory.  Are the collectors running on
> datanode?  Do you have multiple disks for the datanode?  It maybe good to
> map number of disks to (task slot - 1) and let chukwa collector write to a
> disk that is not used concurrently by mapreduce task to provide good
> performance for both data injection and data processing.
>
> regards,
> Eric
>
> On Sat, Nov 10, 2012 at 2:17 PM, Logan Hardy <lo...@33across.com>wrote:
>
>> We are running CentOS 5.4, Chukwa 0.3.0, java version "1.6.0_17", and are
>> feeding a steady stream of data into our CDH3u3 Hadoop cluster. We have 6
>> Chukwa agent machines feeding 3 Chukwa collectors. Any time the cluster
>> gets busy with a big job or the task of decommissioning a node the Chukwa
>> agent and collector start to back up and and I start seeing "WaitingQueue -
>> MemLimitQueue is full" messages in the agent.log as shown below. As soon as
>> hadoop cluster activity dies down the MemLimitQueue messages go away and
>> everything goes back to normal.
>>
>> [root@COLL5 chukwa]# ps auxf | grep chukwa
>> root     11258  0.0  0.0  61172   732 pts/0    S+   15:15   0:00
>>  \_ grep chukwa
>> root     29248  1.2  2.1 415572 86928 ?        Sl   04:03   8:04
>> /usr/java/default/bin/java -Xms32M -Xmx64M -DAPP=agent
>> -Dlog4j.configuration=chukwa-log4j.properties
>> -DCHUKWA_HOME=/usr/local/chukwa/bin/..
>> -DCHUKWA_CONF_DIR=/usr/local/chukwa/bin/../conf
>> -DCHUKWA_LOG_DIR=/usr/local/chukwa/logs -classpath
>> /usr/local/chukwa/bin/../conf::/usr/local/chukwa/bin/../chukwa-agent-0.3.0.jar:/usr/local/chukwa/bin/../chukwa-core-0.3.0.jar:/usr/local/chukwa/bin/../hadoopjars/hadoop-0.20.0-core.jar:/usr/local/chukwa/bin/../lib/NagiosAppender-1.5.0.jar:/usr/local/chukwa/bin/../lib/ant-1.7.1.jar:/usr/local/chukwa/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/chukwa/bin/../lib/asm-3.1.jar:/usr/local/chukwa/bin/../lib/commons-beanutils-1.8.0.jar:/usr/local/chukwa/bin/../lib/commons-cli-2.0-SNAPSHOT.jar:/usr/local/chukwa/bin/../lib/commons-codec-1.3.jar:/usr/local/chukwa/bin/../lib/commons-collections-3.1.jar:/usr/local/chukwa/bin/../lib/commons-fileupload-1.2.jar:/usr/local/chukwa/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/chukwa/bin/../lib/commons-io-1.4.jar:/usr/local/chukwa/bin/../lib/commons-lang-2.4.jar:/usr/local/chukwa/bin/../lib/commons-logging-1.1.1.jar:/usr/local/chukwa/bin/../lib/commons-logging-api-1.0.4.jar:/usr/local/chukwa/bin/../lib/commons-net-1.4.1.jar:/usr/local/chukwa/bin/../lib/core-3.1.1.jar:/usr/local/chukwa/bin/../lib/ezmorph-1.0.6.jar:/usr/local/chukwa/bin/../lib/jchronic-0.2.3.jar:/usr/local/chukwa/bin/../lib/jersey-bundle-1.1.0-ea.jar:/usr/local/chukwa/bin/../lib/jetty-6.1.11.jar:/usr/local/chukwa/bin/../lib/jetty-util-6.1.11.jar:/usr/local/chukwa/bin/../lib/json-lib-2.2.3-jdk15.jar:/usr/local/chukwa/bin/../lib/json.jar:/usr/local/chukwa/bin/../lib/jsp-2.1-6.1.11.jar:/usr/local/chukwa/bin/../lib/jsp-api-2.1-6.1.11.jar:/usr/local/chukwa/bin/../lib/jsr311-api-1.0.jar:/usr/local/chukwa/bin/../lib/junit-3.8.1.jar:/usr/local/chukwa/bin/../lib/log4j-1.2.13.jar:/usr/local/chukwa/bin/../lib/mysql-connector-java-5.1.6.jar:/usr/local/chukwa/bin/../lib/prefuse.jar:/usr/local/chukwa/bin/../lib/servlet-api-2.5-6.1.11.jar
>> org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent
>>
>>
>> agent.log
>> ........
>> 2012-11-10 14:56:14,470 INFO Timer-0 ChukwaAgent - writing checkpoint 7257
>> 2012-11-10 14:56:18,655 INFO Timer-1 HttpConnector - # http chunks ACK'ed
>> since last report: 547
>> 2012-11-10 14:56:20,163 INFO HTTP post thread ChukwaHttpSender - >>>>>>
>> HTTP Got success back from http://10.5.200.204:8080/chukwa; response
>> length 832
>> 2012-11-10 14:56:20,163 INFO HTTP post thread HttpConnector - sent 13
>> chunks, got back 13 acks
>> 2012-11-10 14:56:20,163 INFO HTTP post thread ChukwaHttpSender -
>> collected 13 chunks
>> *2012-11-10 14:56:20,163 INFO Thread-6 WaitingQueue - MemLimitQueue is
>> full [8119214]*
>> 2012-11-10 14:56:20,166 INFO HTTP post thread ChukwaHttpSender - >>>>>>
>> HTTP post to http://10.5.200.204:8080/ length = 2286662
>> 2012-11-10 14:56:24,474 INFO Timer-0 ChukwaAgent - writing checkpoint 7258
>> 2012-11-10 14:56:27,293 INFO HTTP post thread ChukwaHttpSender - >>>>>>
>> HTTP Got success back from http://10.5.200.204:8080/chukwa; response
>> length 832
>> 2012-11-10 14:56:27,294 INFO HTTP post thread HttpConnector - sent 13
>> chunks, got back 13 acks
>> 2012-11-10 14:56:27,294 INFO HTTP post thread ChukwaHttpSender -
>> collected 13 chunks
>> *2012-11-10 14:56:27,295 INFO Thread-6 WaitingQueue - MemLimitQueue is
>> full [8091188]*
>> 2012-11-10 14:56:27,302 INFO HTTP post thread ChukwaHttpSender - >>>>>>
>> HTTP post to http://10.5.200.204:8080/ length = 2214008
>> 2012-11-10 14:56:29,476 INFO Timer-0 ChukwaAgent - writing checkpoint 7259
>>
>>
>> Any ideas?
>>
>> --
>> --
>> *Logan Hardy *| Operations Engineer
>> 33Across <http://www.33across.com/> | Follow us: Twitter<http://www.twitter.com/33across>
>>  | Facebook <http://www.facebook.com/33across>
>>
>> o 801.231.4573
>>
>> *Learn about our Q1 Brand Graph Category Insights Report<http://www.33across.com/BrandGraph/33Across_BrandGraph_AQ1_2012.pdf>
>> *
>> *
>> 33Across and Tynt in the News
>> *AdWeek • AllThingsD • Bloomberg • Forbes • TechCrunch • VentureBeat •
>> WSJ <http://33across.com/news.php#axzz1uqxl0v16>
>>
>>
>


-- 
-- 
*Logan Hardy *| Operations Engineer
33Across <http://www.33across.com/> | Follow us:
Twitter<http://www.twitter.com/33across>
 | Facebook <http://www.facebook.com/33across>

o 801.231.4573

*Learn about our Q1 Brand Graph Category Insights
Report<http://www.33across.com/BrandGraph/33Across_BrandGraph_AQ1_2012.pdf>
*
*
33Across and Tynt in the News
*AdWeek • AllThingsD • Bloomberg • Forbes • TechCrunch • VentureBeat •
WSJ<http://33across.com/news.php#axzz1uqxl0v16>

Re: WaitingQueue - MemLimitQueue is full

Posted by Eric Yang <er...@gmail.com>.
Hi Logan,

It looks like the datanode is saturated when large mapreduce job is in
process.  Chukwa agent will drop data on the floor, if there is more data
that agent can be buffer in memory.  Are the collectors running on
datanode?  Do you have multiple disks for the datanode?  It maybe good to
map number of disks to (task slot - 1) and let chukwa collector write to a
disk that is not used concurrently by mapreduce task to provide good
performance for both data injection and data processing.

regards,
Eric

On Sat, Nov 10, 2012 at 2:17 PM, Logan Hardy <lo...@33across.com>wrote:

> We are running CentOS 5.4, Chukwa 0.3.0, java version "1.6.0_17", and are
> feeding a steady stream of data into our CDH3u3 Hadoop cluster. We have 6
> Chukwa agent machines feeding 3 Chukwa collectors. Any time the cluster
> gets busy with a big job or the task of decommissioning a node the Chukwa
> agent and collector start to back up and and I start seeing "WaitingQueue -
> MemLimitQueue is full" messages in the agent.log as shown below. As soon as
> hadoop cluster activity dies down the MemLimitQueue messages go away and
> everything goes back to normal.
>
> [root@COLL5 chukwa]# ps auxf | grep chukwa
> root     11258  0.0  0.0  61172   732 pts/0    S+   15:15   0:00
>  \_ grep chukwa
> root     29248  1.2  2.1 415572 86928 ?        Sl   04:03   8:04
> /usr/java/default/bin/java -Xms32M -Xmx64M -DAPP=agent
> -Dlog4j.configuration=chukwa-log4j.properties
> -DCHUKWA_HOME=/usr/local/chukwa/bin/..
> -DCHUKWA_CONF_DIR=/usr/local/chukwa/bin/../conf
> -DCHUKWA_LOG_DIR=/usr/local/chukwa/logs -classpath
> /usr/local/chukwa/bin/../conf::/usr/local/chukwa/bin/../chukwa-agent-0.3.0.jar:/usr/local/chukwa/bin/../chukwa-core-0.3.0.jar:/usr/local/chukwa/bin/../hadoopjars/hadoop-0.20.0-core.jar:/usr/local/chukwa/bin/../lib/NagiosAppender-1.5.0.jar:/usr/local/chukwa/bin/../lib/ant-1.7.1.jar:/usr/local/chukwa/bin/../lib/ant-launcher-1.7.1.jar:/usr/local/chukwa/bin/../lib/asm-3.1.jar:/usr/local/chukwa/bin/../lib/commons-beanutils-1.8.0.jar:/usr/local/chukwa/bin/../lib/commons-cli-2.0-SNAPSHOT.jar:/usr/local/chukwa/bin/../lib/commons-codec-1.3.jar:/usr/local/chukwa/bin/../lib/commons-collections-3.1.jar:/usr/local/chukwa/bin/../lib/commons-fileupload-1.2.jar:/usr/local/chukwa/bin/../lib/commons-httpclient-3.0.1.jar:/usr/local/chukwa/bin/../lib/commons-io-1.4.jar:/usr/local/chukwa/bin/../lib/commons-lang-2.4.jar:/usr/local/chukwa/bin/../lib/commons-logging-1.1.1.jar:/usr/local/chukwa/bin/../lib/commons-logging-api-1.0.4.jar:/usr/local/chukwa/bin/../lib/commons-net-1.4.1.jar:/usr/local/chukwa/bin/../lib/core-3.1.1.jar:/usr/local/chukwa/bin/../lib/ezmorph-1.0.6.jar:/usr/local/chukwa/bin/../lib/jchronic-0.2.3.jar:/usr/local/chukwa/bin/../lib/jersey-bundle-1.1.0-ea.jar:/usr/local/chukwa/bin/../lib/jetty-6.1.11.jar:/usr/local/chukwa/bin/../lib/jetty-util-6.1.11.jar:/usr/local/chukwa/bin/../lib/json-lib-2.2.3-jdk15.jar:/usr/local/chukwa/bin/../lib/json.jar:/usr/local/chukwa/bin/../lib/jsp-2.1-6.1.11.jar:/usr/local/chukwa/bin/../lib/jsp-api-2.1-6.1.11.jar:/usr/local/chukwa/bin/../lib/jsr311-api-1.0.jar:/usr/local/chukwa/bin/../lib/junit-3.8.1.jar:/usr/local/chukwa/bin/../lib/log4j-1.2.13.jar:/usr/local/chukwa/bin/../lib/mysql-connector-java-5.1.6.jar:/usr/local/chukwa/bin/../lib/prefuse.jar:/usr/local/chukwa/bin/../lib/servlet-api-2.5-6.1.11.jar
> org.apache.hadoop.chukwa.datacollection.agent.ChukwaAgent
>
>
> agent.log
> ........
> 2012-11-10 14:56:14,470 INFO Timer-0 ChukwaAgent - writing checkpoint 7257
> 2012-11-10 14:56:18,655 INFO Timer-1 HttpConnector - # http chunks ACK'ed
> since last report: 547
> 2012-11-10 14:56:20,163 INFO HTTP post thread ChukwaHttpSender - >>>>>>
> HTTP Got success back from http://10.5.200.204:8080/chukwa; response
> length 832
> 2012-11-10 14:56:20,163 INFO HTTP post thread HttpConnector - sent 13
> chunks, got back 13 acks
> 2012-11-10 14:56:20,163 INFO HTTP post thread ChukwaHttpSender - collected
> 13 chunks
> *2012-11-10 14:56:20,163 INFO Thread-6 WaitingQueue - MemLimitQueue is
> full [8119214]*
> 2012-11-10 14:56:20,166 INFO HTTP post thread ChukwaHttpSender - >>>>>>
> HTTP post to http://10.5.200.204:8080/ length = 2286662
> 2012-11-10 14:56:24,474 INFO Timer-0 ChukwaAgent - writing checkpoint 7258
> 2012-11-10 14:56:27,293 INFO HTTP post thread ChukwaHttpSender - >>>>>>
> HTTP Got success back from http://10.5.200.204:8080/chukwa; response
> length 832
> 2012-11-10 14:56:27,294 INFO HTTP post thread HttpConnector - sent 13
> chunks, got back 13 acks
> 2012-11-10 14:56:27,294 INFO HTTP post thread ChukwaHttpSender - collected
> 13 chunks
> *2012-11-10 14:56:27,295 INFO Thread-6 WaitingQueue - MemLimitQueue is
> full [8091188]*
> 2012-11-10 14:56:27,302 INFO HTTP post thread ChukwaHttpSender - >>>>>>
> HTTP post to http://10.5.200.204:8080/ length = 2214008
> 2012-11-10 14:56:29,476 INFO Timer-0 ChukwaAgent - writing checkpoint 7259
>
>
> Any ideas?
>
> --
> --
> *Logan Hardy *| Operations Engineer
> 33Across <http://www.33across.com/> | Follow us: Twitter<http://www.twitter.com/33across>
>  | Facebook <http://www.facebook.com/33across>
>
> o 801.231.4573
>
> *Learn about our Q1 Brand Graph Category Insights Report<http://www.33across.com/BrandGraph/33Across_BrandGraph_AQ1_2012.pdf>
> *
> *
> 33Across and Tynt in the News
> *AdWeek • AllThingsD • Bloomberg • Forbes • TechCrunch • VentureBeat • WSJ<http://33across.com/news.php#axzz1uqxl0v16>
>
>