You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Yicong Cai (JIRA)" <ji...@apache.org> on 2019/02/22 07:35:00 UTC

[jira] [Created] (HDFS-14311) multi-threading conflict at layoutVersion when loading block pool storage

Yicong Cai created HDFS-14311:
---------------------------------

             Summary: multi-threading conflict at layoutVersion when loading block pool storage
                 Key: HDFS-14311
                 URL: https://issues.apache.org/jira/browse/HDFS-14311
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: rolling upgrades
    Affects Versions: 2.9.2
            Reporter: Yicong Cai


When DataNode upgrade from 2.7.3 to 2.9.2, there is a conflict at StorageInfo.layoutVersion in loading block pool storage process.

It will cause this exception:

 
{panel:title=exceptions}
2019-02-15 10:18:01,357 [13783] - INFO [Thread-33:BlockPoolSliceStorage@395] - Restored 36974 block files from trash before the layout upgrade. These blocks will be moved to the previous directory during the upgrade
2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:BlockPoolSliceStorage@226] - Failed to analyze storage directories for block pool BP-1216718839-10.120.232.23-1548736842023
java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the namespace state: LV = -63 CTime = 0
 at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
 at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
 at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
 at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
 at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
 at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
 at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
 at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
 at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
 at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
 at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
 at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
 at java.lang.Thread.run(Thread.java:748)
2019-02-15 10:18:01,358 [13784] - WARN [Thread-33:DataStorage@472] - Failed to add storage directory [DISK]file:/mnt/dfs/2/hadoop/hdfs/data/ for block pool BP-1216718839-10.120.232.23-1548736842023
java.io.IOException: Datanode state: LV = -57 CTime = 0 is newer than the namespace state: LV = -63 CTime = 0
 at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:406)
 at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadStorageDirectory(BlockPoolSliceStorage.java:177)
 at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:221)
 at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
 at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:460)
 at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:390)
 at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:556)
 at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1649)
 at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1610)
 at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:388)
 at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:280)
 at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:816)
 at java.lang.Thread.run(Thread.java:748) 
{panel}
 

root cause:

BlockPoolSliceStorage instance is shared for all storage locations recover transition. In BlockPoolSliceStorage.doTransition, it will read the old layoutVersion from local storage, compare with current DataNode version, then do upgrade. In doUpgrade, add the transition work as a sub-thread, the transition work will set the BlockPoolSliceStorage's layoutVersion to current DN version. The next storage dir transition check will concurrent with pre storage dir real transition work, then the BlockPoolSliceStorage instance layoutVersion will confusion.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org