You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2010/03/29 19:35:25 UTC
[Hadoop Wiki] Update of "Hbase/Troubleshooting" by JeanDanielCryans
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "Hbase/Troubleshooting" page has been changed by JeanDanielCryans.
The comment on this change is: revamped the GC pauses entry.
http://wiki.apache.org/hadoop/Hbase/Troubleshooting?action=diff&rev1=38&rev2=39
--------------------------------------------------
== 9. Problem: ZooKeeper SessionExpired events ==
* Master or RegionServers reinitialize their ZooKeeper wrappers after receiving SessionExpired events.
* Master or RegionServer ephemeral nodes disappear while the node is still otherwise functional.
+ * Messages those in the logs:
+ {{{
+ WARN org.apache.zookeeper.ClientCnxn: Exception
+ closing session 0x278bd16a96000f to sun.nio.ch.SelectionKeyImpl@355811ec
+ java.io.IOException: TIMED OUT
+ at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
+ WARN org.apache.hadoop.hbase.util.Sleeper: We slept 79410ms, ten times longer than scheduled: 5000
+ INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server hostname/IP:PORT
+ INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/IP:PORT remote=hostname/IP:PORT]
+ INFO org.apache.zookeeper.ClientCnxn: Server connection successful
+ WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x278bd16a96000d to sun.nio.ch.SelectionKeyImpl@3544d65e
+ java.io.IOException: Session Expired
+ at org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
+ at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
+ at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
+ ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expired
+ }}}
=== Causes ===
- * Java GC is starving the ZooKeeper heartbeat thread.
+ * The JVM is doing a long running garbage collecting which is pausing every threads (aka "stop the world").
+ * Since the region server's local zookeeper client cannot send heartbeats, the session times out.
=== Resolution ===
+ * Make sure you give plenty of RAM (in hbase-env.sh), the default of 1GB won't be able to sustain long running imports.
+ * Make sure you don't swap, the JVM never behaves well under swapping.
+ * Make sure you are not CPU starving the region server thread. For example, if you are running a mapreduce job using 6 CPU-intensive tasks on a machine with 4 cores, you are probably starving the region server enough to create longer garbage collection pauses.
- * Increase the session timeout. For example, add the following to your hbase-site.xml to increase the timeout from the default of 10 seconds to 60 seconds.
+ * If you wish to increase the session timeout, add the following to your hbase-site.xml to increase the timeout from the default of 60 seconds to 120 seconds.
{{{
<property>
<name>zookeeper.session.timeout</name>
- <value>60000</value>
+ <value>1200000</value>
</property>
+ <property>
+ <name>hbase.zookeeper.property.tickTime</name>
+ <value>6000</value>
+ </property>
}}}
- * For Java SE 6, some users have had success with {{{ -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:ParallelGCThreads=8 }}}. See HBase [[PerformanceTuning|Performance Tuning]] for more on JVM GC tuning.
+ * Be aware that setting a higher timeout means that the regions served by a failed region server will take at least that amount of time to be transfered to another region server. For a production system serving live requests, we would instead recommend setting it lower than 1 minute and over-provision your cluster in order the lower the memory load on each machines (hence having less garbage to collect per machine).
+ * If this is happening during an upload which only happens once (like initially loading all your data into HBase), consider [[http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#bulk|importing into HFiles directly]].
+ * HBase ships with some GC tuning, for more information see [[PerformanceTuning|Performance Tuning]].
<<Anchor(10)>>