You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Gaurav Agarwal <ga...@arkin.net> on 2015/10/26 18:36:20 UTC

wal.FSHLog: Error syncing, request close of wal (regionserver crashes)

Hi All,

We are running hbase -  *Version 1.0.0-cdh5.4.2, rUnknown, Tue May 19
17:04:41 PDT 2015,* and are facing the problem in the bug (
https://issues.apache.org/jira/browse/HBASE-12074), where the regionserver
crashes due to concurrent roll of wal file.

Below are failure logs from one of the instance in our env:

2015-10-25 22:09:41,885 INFO
[regionserver/localhost/127.0.0.1:60020.logRoller]
wal.FSHLog: Rolled WAL
/var/lib/hbase/data/WALs/localhost,60020,1445796437179/localhost%2C60020%2C1445796437179.default.1445810949648
with entries=11826, filesize=30.40 MB; new WAL
/var/lib/hbase/data/WALs/localhost,60020,1445796437179/localhost%2C60020%2C1445796437179.default.1445810981882
2015-10-25 22:10:09,177 INFO
[regionserver/localhost/127.0.0.1:60020.logRoller]
wal.FSHLog: Rolled WAL
/var/lib/hbase/data/WALs/localhost,60020,1445796437179/localhost%2C60020%2C1445796437179.default.1445810981882
with entries=7796, filesize=30.41 MB; new WAL
/var/lib/hbase/data/WALs/localhost,60020,1445796437179/localhost%2C60020%2C1445796437179.default.1445811009174
2015-10-25 22:10:09,189 ERROR [sync.2] wal.FSHLog: Error syncing, request
close of wal
java.io.IOException: java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:176)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1334)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:173)
... 2 more
2015-10-25 22:10:09,226 FATAL
[regionserver/localhost/127.0.0.1:60020.logRoller]
regionserver.HRegionServer: ABORTING region server
localhost,60020,1445796437179: IOE in log roller
java.io.IOException: java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:176)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1334)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:173)
... 2 more
2015-10-25 22:10:09,226 FATAL
[regionserver/localhost/127.0.0.1:60020.logRoller]
regionserver.HRegionServer: RegionServer abort: loaded coprocessors are:
[org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]

Does anyone know if there is some workaround this problem or if there is a
patch for this?
If there is no workarounds/patch, I can help create a patch but would need
some general guidance on what could be going on here.

--cheers, gaurav

Re: wal.FSHLog: Error syncing, request close of wal (regionserver crashes)

Posted by Gaurav Agarwal <ga...@arkin.net>.
Hi, just a bump on this post to check if anyone knows more about this...

On Mon, Oct 26, 2015 at 11:06 PM, Gaurav Agarwal <ga...@arkin.net> wrote:

> Hi All,
>
> We are running hbase -  *Version 1.0.0-cdh5.4.2, rUnknown, Tue May 19
> 17:04:41 PDT 2015,* and are facing the problem in the bug (
> https://issues.apache.org/jira/browse/HBASE-12074), where the
> regionserver crashes due to concurrent roll of wal file.
>
> Below are failure logs from one of the instance in our env:
>
> 2015-10-25 22:09:41,885 INFO  [regionserver/localhost/127.0.0.1:60020.logRoller]
> wal.FSHLog: Rolled WAL
> /var/lib/hbase/data/WALs/localhost,60020,1445796437179/localhost%2C60020%2C1445796437179.default.1445810949648
> with entries=11826, filesize=30.40 MB; new WAL
> /var/lib/hbase/data/WALs/localhost,60020,1445796437179/localhost%2C60020%2C1445796437179.default.1445810981882
> 2015-10-25 22:10:09,177 INFO  [regionserver/localhost/127.0.0.1:60020.logRoller]
> wal.FSHLog: Rolled WAL
> /var/lib/hbase/data/WALs/localhost,60020,1445796437179/localhost%2C60020%2C1445796437179.default.1445810981882
> with entries=7796, filesize=30.41 MB; new WAL
> /var/lib/hbase/data/WALs/localhost,60020,1445796437179/localhost%2C60020%2C1445796437179.default.1445811009174
> 2015-10-25 22:10:09,189 ERROR [sync.2] wal.FSHLog: Error syncing, request
> close of wal
> java.io.IOException: java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:176)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1334)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:173)
> ... 2 more
> 2015-10-25 22:10:09,226 FATAL [regionserver/localhost/127.0.0.1:60020.logRoller]
> regionserver.HRegionServer: ABORTING region server
> localhost,60020,1445796437179: IOE in log roller
> java.io.IOException: java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:176)
> at
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1334)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:173)
> ... 2 more
> 2015-10-25 22:10:09,226 FATAL [regionserver/localhost/127.0.0.1:60020.logRoller]
> regionserver.HRegionServer: RegionServer abort: loaded coprocessors are:
> [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
>
> Does anyone know if there is some workaround this problem or if there is a
> patch for this?
> If there is no workarounds/patch, I can help create a patch but would need
> some general guidance on what could be going on here.
>
> --cheers, gaurav
>



-- 
--cheers, gaurav