You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2007/04/07 01:36:32 UTC

[jira] Commented: (HADOOP-1221) high cpu usage in ReplicationMonitor thread

    [ https://issues.apache.org/jira/browse/HADOOP-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487360 ] 

Raghu Angadi commented on HADOOP-1221:
--------------------------------------


We were looking at the the namenode code around the above trace. This is what it is doing :

max = 100; // in this case
 for( iter = invalidateSet.iterator(); max > 0; max-- ) {
      it.remove();
}

invalidateSet is not actually set but ArrayList(). So if it has 500 blocks, the above loop could result in 450 blocks shifted 100 times in the array. This could be one of the things exaggerating CPU. We could use LinkedList for this and also not call it a 'Set' since that could imply to the readers that this container is a Set.

If each it.remove() resulted in a big memmove(), do you think we should have seen more Java stuff above remove() in the stack trace?

Next we should also capture pstack of the JVM also so that we can see what this is doing in JVM..

Note that changing container to LinkedList might only reduce the CPU but won't fix the bug if there is any.


> high cpu usage in ReplicationMonitor thread 
> --------------------------------------------
>
>                 Key: HADOOP-1221
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1221
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Koji Noguchi
>
> We had a namenode stuck in CPU 99% and it  was showing a slow response time.
> (dfs.namenode.handler.count was still set to 10.)
> ReplicationMonitor thread was using the most CPU time.
> Jstack showed,
> "org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor@1c7b0f4d" daemon prio=10 tid=0x0000002d90690800 nid=0x4855 runnable [0x0000000041941000..0x0000000041941b30]
>    java.lang.Thread.State: RUNNABLE
>   at java.util.AbstractList$Itr.remove(AbstractList.java:360)
>   at org.apache.hadoop.dfs.FSNamesystem.blocksToInvalidate(FSNamesystem.java:2475)
>   - locked <0x0000002a9f522038> (a org.apache.hadoop.dfs.FSNamesystem)
>   at org.apache.hadoop.dfs.FSNamesystem.computeDatanodeWork(FSNamesystem.java:1775)
>   at org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:1713)
>   at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.