You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Anoop Sam John (Jira)" <ji...@apache.org> on 2020/07/25 07:10:00 UTC

[jira] [Commented] (HBASE-24713) RS startup with FSHLog throws NPE after HBASE-21751

    [ https://issues.apache.org/jira/browse/HBASE-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164794#comment-17164794 ] 

Anoop Sam John commented on HBASE-24713:
----------------------------------------

[~ram_krish],  Is it caused by HBASE-21751?  Seeing the patch there , ya in branch-2.1, this patch only moved the rollWriter() from the constructor to the init.  So the call happens after the disruptor is started.
But seeing the branch-2.2+ patches there, looks like it just added a catch at FSHLog create and do RS abort in case of exception.  So at least there, this move of rollWriter happened as part of some other jira.  So the actual reason for NPE is the move of the rollWriter correct. 
rollWriter() will happen as part of init.  This API only called for creating the initial writer itself. As part of rollWriter()'s replaceWriter() call, we will try attain a safe point and that include a sycn call.. Previously this sync call was not happening because the roll call was on constructor and by then ringBufferEventHandler object in FSHLog was null. So because of that there is no waitSafePoint call needed and so no sync call..  Now we delayed the call to roll as it is moved to init which is called after ringBufferEventHandler been initialized.  
The null is fine.  Or else we could have added some thing like writer check while trying for attain safe point
Now
{code}
SafePointZigZagLatch zigzagLatch = null;
    long sequence = -1L;
    if (this.ringBufferEventHandler != null) {
      sequence = getSequenceOnRingBuffer();
      zigzagLatch = this.ringBufferEventHandler.attainSafePoint();
    }
    afterCreatingZigZagLatch();
    try {
      try {
        if (zigzagLatch != null) {
          assert sequence > 0L : "Failed to get sequence from ring buffer";
          TraceUtil.addTimelineAnnotation("awaiting safepoint");
          syncFuture = zigzagLatch.waitSafePoint(publishSyncOnRingBuffer(sequence, false));
        }
{code}
publishSyncOnRingBuffer -> Only causing this sync call and so a run by SyncerThread
We can add
 if ( this.writer != null && this.ringBufferEventHandler != null) {

This will be good addition I believe.  This null check is ok only.

> RS startup with FSHLog throws NPE after HBASE-21751
> ---------------------------------------------------
>
>                 Key: HBASE-24713
>                 URL: https://issues.apache.org/jira/browse/HBASE-24713
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 2.1.6
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: Gaurav Kanade
>            Priority: Minor
>
> Every RS startup creates this NPE
> {code}
> [sync.1] wal.FSHLog: UNEXPECTED
> java.lang.NullPointerException
>         at org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:582)
>         at java.lang.Thread.run(Thread.java:748)
> 2020-07-07 10:51:23,208 WARN  [regionserver/xxxxx:16020] wal.FSHLog: Failed sync-before-close but no outstanding appends; closing WALjava.lang.NullPointerException
> {code}
> the reason is that the Disruptor frameworks starts the Syncrunner thread but the init of the writer happens after that. A simple null check in the Syncrunner will help here .
> No major damage happens though since we handle Throwable Exception. It will good to solve this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)