You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "ryan rawson (JIRA)" <ji...@apache.org> on 2009/06/23 10:20:07 UTC

[jira] Commented: (HBASE-1569) rare race condition can take down a regionserver.

    [ https://issues.apache.org/jira/browse/HBASE-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723002#action_12723002 ] 

ryan rawson commented on HBASE-1569:
------------------------------------

this is a race condition, here is how it happens:

doMetrics() calls getStorefilesIndexSize() which gets a view of the storefiles ConcurrentSkipListMap at some point in time.  Working on this snapshot it calls each store file in turn asking for the index size.

In another thread, the compaction completion code finishes, first thing it does is:
- remove store files from the storefiles list.
- do some stuff
- close the aforementioned store files, which causes the this.reader to become null.

Back in thread #1, we run into the this.reader == null, and we throw the exception.

So we need to do either of:
- sync on this map, use a synced versin of the map
- allow the ability to check this metrics without causing a RS abort when we hit an exception.  Either catch it, or prevent it from happening.

> rare race condition can take down a regionserver. 
> --------------------------------------------------
>
>                 Key: HBASE-1569
>                 URL: https://issues.apache.org/jira/browse/HBASE-1569
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>            Priority: Critical
>             Fix For: 0.20.0
>
>
> this happened after > 24 hours of heavy import load on my cluster.  Luckily the shutdown seemed to be clean:
> java.lang.IllegalAccessError: Call open first
>         at org.apache.hadoop.hbase.regionserver.StoreFile.getReader(StoreFile.java:356)
>         at org.apache.hadoop.hbase.regionserver.Store.getStorefilesIndexSize(Store.java:1378)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.doMetrics(HRegionServer.java:1075)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:454)
>         at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.