You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Rushabh Shah (Jira)" <ji...@apache.org> on 2021/11/09 16:53:00 UTC
[jira] [Assigned] (HBASE-26435) [branch-1] The log rolling request maybe canceled immediately in LogRoller due to a race

     [ https://issues.apache.org/jira/browse/HBASE-26435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rushabh Shah reassigned HBASE-26435:
------------------------------------

    Assignee: Rushabh Shah

> [branch-1] The log rolling request maybe canceled immediately in LogRoller due to a race 
> -----------------------------------------------------------------------------------------
>
>                 Key: HBASE-26435
>                 URL: https://issues.apache.org/jira/browse/HBASE-26435
>             Project: HBase
>          Issue Type: Sub-task
>          Components: wal
>    Affects Versions: 1.6.0
>            Reporter: Rushabh Shah
>            Assignee: Rushabh Shah
>            Priority: Major
>             Fix For: 1.7.2
>
>
> Saw this issue in our internal 1.6 branch.
> The WAL  was rolled but the new WAL file was not writable and it logged the following error also
> {noformat}
> 2021-11-03 19:20:19,503 WARN  [.168:60020.logRoller] hdfs.DFSClient - Error while syncing
> java.io.IOException: Could not get block locations. Source file "/hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635967219389" - Aborting...
>         at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1466)
>         at org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1251)
>         at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:670)
> 2021-11-03 19:20:19,507 WARN  [.168:60020.logRoller] wal.FSHLog - pre-sync failed but an optimization so keep going
> java.io.IOException: Could not get block locations. Source file "/hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635967219389" - Aborting...
>         at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1466)
>         at org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1251)
>         at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:670)
> {noformat}
> Since the new WAL file was not writable, appends to that file started failing immediately it was rolled.
> {noformat}
> 2021-11-03 19:20:19,677 INFO  [.168:60020.logRoller] wal.FSHLog - Rolled WAL /hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635965392022 with entries=253234, filesize=425.67 MB; new WAL /hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635967219389
> 2021-11-03 19:20:19,690 WARN  [020.append-pool17-t1] wal.FSHLog - Append sequenceId=1962661783, requesting roll of WAL
> java.io.IOException: Could not get block locations. Source file "/hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635967219389" - Aborting...
>         at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1466)
>         at org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1251)
>         at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:670)
> 2021-11-03 19:20:19,690 INFO  [.168:60020.logRoller] wal.FSHLog - Archiving hdfs://prod-EMPTY-hbase2a/hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635960792837 to hdfs://prod-EMPTY-hbase2a/hbase/oldWALs/hbase2a-dnds1-232-ukb.ops.sfdc.net%2C60020%2C1635567166484.1635960792837
> {noformat}
> We always reset the rollLog flag within LogRoller thread after the rollWal call is complete.
> Within FSHLog#rollWriter method, it does many things, like replacing the writer and archiving old logs. If append thread fails to write to new file while logRoller thread is cleaning old logs, we will miss the rollLog flag since LogRoller will reset the flag to false while the previous rollWriter call is going on.
> Relevant code: https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java#L183-L203
> We need to reset rollLog flag before we start rolling the wal. 
> This is fixed in branch-2 and master via HBASE-22684 but we didn't fix it in branch-1
> Also branch-2 has multi wal implementation so it can apply cleanly in branch-1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)