You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Victor Xu <vi...@gmail.com> on 2014/04/04 05:13:01 UTC

Bulkload process hangs on regions randomly and finally throws RegionTooBusyException

Hi, all
I came across the problem in the early morning several days ago. It
happened when I used hadoop completebulkload command to bulk load some hdfs
files into a hbase table. Several regions hung and after retried three
times they threw RegionTooBusyExceptions. Fortunately, I caught one of the
exceptional region’s HRegionServer process’s jstack info just in time.
I found that the bulkload process was waiting for a write lock:
at
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115)
The lock id is 0x00000004054ecbf0.
In the meantime, many other Get/Scan operations were also waiting for the
same lock id. And, of course, they were waiting for the read lock:
at
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:873)
The most ridiculous thing is NO ONE OWNED THE LOCK! I searched the jstack
output carefully, but cannot find any process who claimed to hold the lock.
When I restart the bulk load process, it failed at different regions but
with the same RegionTooBusyExceptions.
I guess maybe the region was doing some compactions at that time and owned
the lock, but I couldn’t find compaction info in the hbase-logs.
Finally, after several days’ hard work, the only temporary solution to this
problem was found, that is TRIGGERING A MAJOR COMPACTION BEFORE THE
BULKLOAD,
So which process owned the lock? Has anyone came across the same problem
before?
PS: I compress the jstack file because it reaches the size limit of this
mail-list.

Re: Bulkload process hangs on regions randomly and finally throws RegionTooBusyException

Posted by Ted Yu <yu...@gmail.com>.

Looking below the 'parking to wait for' line, we see:

  at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4840)
  at
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2279)

For HRegionServer.java at tip of 0.94, line 2279 is in put() call.

What version of HBase are you using ?


On Thu, Apr 3, 2014 at 8:13 PM, Victor Xu <vi...@gmail.com> wrote:

> Hi, all
> I came across the problem in the early morning several days ago. It
> happened when I used hadoop completebulkload command to bulk load some hdfs
> files into a hbase table. Several regions hung and after retried three
> times they threw RegionTooBusyExceptions. Fortunately, I caught one of the
> exceptional region's HRegionServer process's jstack info just in time.
> I found that the bulkload process was waiting for a write lock:
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115)
> The lock id is 0x00000004054ecbf0.
> In the meantime, many other Get/Scan operations were also waiting for the
> same lock id. And, of course, they were waiting for the read lock:
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:873)
> The most ridiculous thing is NO ONE OWNED THE LOCK! I searched the jstack
> output carefully, but cannot find any process who claimed to hold the lock.
> When I restart the bulk load process, it failed at different regions but
> with the same RegionTooBusyExceptions.
> I guess maybe the region was doing some compactions at that time and owned
> the lock, but I couldn't find compaction info in the hbase-logs.
> Finally, after several days' hard work, the only temporary solution to
> this problem was found, that is TRIGGERING A MAJOR COMPACTION BEFORE THE
> BULKLOAD,
> So which process owned the lock? Has anyone came across the same problem
> before?
> PS: I compress the jstack file because it reaches the size limit of this
> mail-list.
>