You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Ted Yu (JIRA)" <ji...@apache.org> on 2018/03/02 10:12:00 UTC

[jira] [Comment Edited] (HBASE-20090) Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run

    [ https://issues.apache.org/jira/browse/HBASE-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383145#comment-16383145 ] 

Ted Yu edited comment on HBASE-20090 at 3/2/18 10:11 AM:
---------------------------------------------------------

I added log (the first line below) for related variables in the Precondition check:
{code}
2018-03-02 00:58:18,880 DEBUG [MemStoreFlusher.0] regionserver.MemStoreFlusher: regionToFlush ATLAS_ENTITY_AUDIT_EVENTS,,1519927487389.6b67b274d95d61fcf4c5ab91e102994d.       regionToFlushSize=0 bestRegionReplica null bestRegionReplicaSize=0
2018-03-02 00:58:18,881 ERROR [MemStoreFlusher.0] regionserver.MemStoreFlusher: Cache flusher failed for entry org.apache.hadoop.hbase.regionserver.MemStoreFlusher$1@2a
java.lang.IllegalStateException
        at org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkState(Preconditions.java:441)
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:259)
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$700(MemStoreFlusher.java:69)
        at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:345)
{code}
We can see that bestRegionReplica was null and the region for ATLAS_ENTITY_AUDIT_EVENTS had 0 flush size(because TestTable was written to, not ATLAS_ENTITY_AUDIT_EVENTS).

It seems the Preconditions check can be converted to a normal condition check.
[~ram_krish] [~anoop.hbase] [~anastas] :
Can you take a look at the patch ?
Here was snippet from region server log during PE randomWrite:
{code}
2018-03-02 03:55:19,232 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: Flush of region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global     heap pressure. Flush type=ABOVE_ONHEAP_HIGHER_MARKTotal Memstore Heap size=403.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
2018-03-02 03:55:19,232 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: Nothing to flush for atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
2018-03-02 03:55:19,232 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: Excluding unflushable region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -    trying to find a different region to flush.
{code}
Note atlas_janus was not the table being written.
TestTable was being written to.


was (Author: yuzhihong@gmail.com):
It seems the Preconditions check can be converted to a normal condition check.
[~ram_krish] [~anoop.hbase] [~anastas] :
Can you take a look at the patch ?
Here was snippet from region server log during PE randomWrite:
{code}
2018-03-02 03:55:19,232 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: Flush of region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. due to global     heap pressure. Flush type=ABOVE_ONHEAP_HIGHER_MARKTotal Memstore Heap size=403.9 MTotal Memstore Off-Heap size=0, Region memstore size=0
2018-03-02 03:55:19,232 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: Nothing to flush for atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae.
2018-03-02 03:55:19,232 INFO  [MemStoreFlusher.1] regionserver.MemStoreFlusher: Excluding unflushable region atlas_janus,,1519927429371.fbcb5e495344542daf8b499e4bac03ae. -    trying to find a different region to flush.
{code}
Note atlas_janus was not the table being written.
TestTable was being written to.

> Properly handle Preconditions check failure in MemStoreFlusher$FlushHandler.run
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-20090
>                 URL: https://issues.apache.org/jira/browse/HBASE-20090
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Major
>         Attachments: 20094.v01.patch
>
>
> Here is the code in branch-2 :
> {code}
>         try {
>           wakeupPending.set(false); // allow someone to wake us up again
>           fqe = flushQueue.poll(threadWakeFrequency, TimeUnit.MILLISECONDS);
>           if (fqe == null || fqe instanceof WakeupFlushThread) {
> ...
>               if (!flushOneForGlobalPressure()) {
> ...
>           FlushRegionEntry fre = (FlushRegionEntry) fqe;
>           if (!flushRegion(fre)) {
>             break;
> ...
>         } catch (Exception ex) {
>           LOG.error("Cache flusher failed for entry " + fqe, ex);
>           if (!server.checkFileSystem()) {
>             break;
>           }
>         }
> {code}
> Inside flushOneForGlobalPressure():
> {code}
>       Preconditions.checkState(
>         (regionToFlush != null && regionToFlushSize > 0) ||
>         (bestRegionReplica != null && bestRegionReplicaSize > 0));
> {code}
> When the Preconditions check fails, IllegalStateException is caught by the catch block shown above.
> However, the fqe is not flushed, resulting in potential data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)