You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Wenqi Ma <ma...@gmail.com> on 2020/04/22 11:50:39 UTC

Datanode cannot start

Dear Hadoop community,

A strange problem happened in our fresh installed hdfs cluster. The
datanode cannot start because two threads tried to load storage, the latter
would fail and lead a FATAL error.
As the following log shown, the blue text
means DataStorage::addStorageLocations is called, the red text are I added
to find the thread details.
I compared the log with normal datanode's in another cluster, and there is
only one line (blue text).

Any suggestion would be appreciated, thanks!

2020-04-22 19:32:25,666 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received
for nameservices: nameservice1
2020-04-22 19:32:25,718 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices
for nameservices: nameservice1
2020-04-22 19:32:25,726 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering>
(Datanode Uuid unassigned) service to node238/172.16.2.238:8020 starting to
offer service
2020-04-22 19:32:25,727 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering>
(Datanode Uuid unassigned) service to node239/172.16.2.239:8020 starting to
offer service
2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 50020: starting
2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2020-04-22 19:32:25,885 INFO org.apache.hadoop.hdfs.server.common.Storage:
kkk current thread ID: 30
2020-04-22 19:32:25,885 INFO org.apache.hadoop.hdfs.server.common.Storage:
kkk current thread: Thread[DataNode:
[[[DISK]file:/vdir/mnt/disk2/hadoop/data/]]  heartbeating to node239/
172.16.2.239:8020,5,main]

2020-04-22 19:32:25,885 INFO
org.apache.hadoop.hdfs.server.common.Storage: Using
1 threads to upgrade data directories
(dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
2020-04-22 19:32:25,890 INFO org.apache.hadoop.hdfs.server.common.Storage:
Lock on /vdir/mnt/disk2/hadoop/data/in_use.lock acquired by nodename
35@node238
2020-04-22 19:32:25,918 INFO org.apache.hadoop.hdfs.server.common.Storage:
Analyzing storage directories for bpid
BP-1558632922-172.16.2.238-1587535931027
2020-04-22 19:32:25,918 INFO org.apache.hadoop.hdfs.server.common.Storage:
Locking is disabled for
/vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027
2020-04-22 19:32:25,920 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage:
nsid=613035685;bpid=BP-1558632922-172.16.2.238-1587535931027;lv=-56;nsInfo=lv=-63;cid=hdfs1;nsid=613035685;c=0;bpid=BP-1558632922-172.16.2.238-1587535931027;dnuuid=0b14101d-f4f7-4004-9eaf-ffd157d4aefe
2020-04-22 19:32:25,931 INFO org.apache.hadoop.hdfs.server.common.Storage:
kkk current thread ID: 29
2020-04-22 19:32:25,931 INFO org.apache.hadoop.hdfs.server.common.Storage:
kkk current thread: Thread[DataNode:
[[[DISK]file:/vdir/mnt/disk2/hadoop/data/]]  heartbeating to node238/
172.16.2.238:8020,5,main]
2020-04-22 19:32:25,931 INFO
org.apache.hadoop.hdfs.server.common.Storage: Using
1 threads to upgrade data directories
(dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
2020-04-22 19:32:25,932 INFO org.apache.hadoop.hdfs.server.common.Storage:
Storage directory [DISK]file:/vdir/mnt/disk2/hadoop/data/ has already been
used.
2020-04-22 19:32:25,950 INFO org.apache.hadoop.hdfs.server.common.Storage:
Analyzing storage directories for bpid
BP-1558632922-172.16.2.238-1587535931027
2020-04-22 19:32:25,950 WARN org.apache.hadoop.hdfs.server.common.Storage:
Failed to analyze storage directories for block pool
BP-1558632922-172.16.2.238-1587535931027
java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead: attempt
to load an used block storage:
/vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027
    at
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:217)
    at
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
    at
org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:482)
    at
org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:408)

    at
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:587)
    at
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1397)

    at
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1362)

    at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
    at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:224)
    at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)

    at java.lang.Thread.run(Thread.java:748)

-- 
Best Regards!
Wenqi

Re: Datanode cannot start

Posted by Gurmukh Singh <gu...@yahoo.com.INVALID>.
Hi Wenqi,

Which version of hadoop are you using? (My initial guess is 3.X, as )

Can you see anything in the datanode logs on the lines of 
"org.apache.hadoop.hdfs.server.common.InconsistentFSStateException"

Although, you mention that it is new cluster, make sure the ClusterID 
matches. --> "attempt to load an used block storage: 
/vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027"


On 22/4/20 9:50 pm, Wenqi Ma wrote:
> Dear Hadoop community,
>
> A strange problem happened in our fresh installed hdfs cluster. The 
> datanode cannot start because two threads tried to load storage, the 
> latter would fail and lead a FATAL error.
> As the following log shown, the blue text 
> means DataStorage::addStorageLocations is called, the red text are I 
> added to find the thread details.
> I compared the log with normal datanode's in another cluster, and 
> there is only one line (blue text).
>
> Any suggestion would be appreciated, thanks!
>
> 2020-04-22 19:32:25,666 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request 
> received for nameservices: nameservice1
> 2020-04-22 19:32:25,718 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Starting 
> BPOfferServices for nameservices: nameservice1
> 2020-04-22 19:32:25,726 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool 
> <registering> (Datanode Uuid unassigned) service to 
> node238/172.16.2.238:8020 <http://172.16.2.238:8020> starting to offer 
> service
> 2020-04-22 19:32:25,727 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool 
> <registering> (Datanode Uuid unassigned) service to 
> node239/172.16.2.239:8020 <http://172.16.2.239:8020> starting to offer 
> service
> 2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 50020: starting
> 2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2020-04-22 19:32:25,885 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: kkk current thread ID: 30
> 2020-04-22 19:32:25,885 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: kkk current thread: 
> Thread[DataNode: [[[DISK]file:/vdir/mnt/disk2/hadoop/data/]] 
>  heartbeating to node239/172.16.2.239:8020 
> <http://172.16.2.239:8020>,5,main]
> 2020-04-22 19:32:25,885 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: Using 1 threads to 
> upgrade data directories 
> (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
> 2020-04-22 19:32:25,890 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: Lock on 
> /vdir/mnt/disk2/hadoop/data/in_use.lock acquired by nodename 35@node238
> 2020-04-22 19:32:25,918 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage 
> directories for bpid BP-1558632922-172.16.2.238-1587535931027
> 2020-04-22 19:32:25,918 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for 
> /vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027
> 2020-04-22 19:32:25,920 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage: 
> nsid=613035685;bpid=BP-1558632922-172.16.2.238-1587535931027;lv=-56;nsInfo=lv=-63;cid=hdfs1;nsid=613035685;c=0;bpid=BP-1558632922-172.16.2.238-1587535931027;dnuuid=0b14101d-f4f7-4004-9eaf-ffd157d4aefe
> 2020-04-22 19:32:25,931 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: kkk current thread ID: 29
> 2020-04-22 19:32:25,931 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: kkk current thread: 
> Thread[DataNode: [[[DISK]file:/vdir/mnt/disk2/hadoop/data/]] 
>  heartbeating to node238/172.16.2.238:8020 
> <http://172.16.2.238:8020>,5,main]
> 2020-04-22 19:32:25,931 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: Using 1 threads to 
> upgrade data directories 
> (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
> 2020-04-22 19:32:25,932 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: Storage directory 
> [DISK]file:/vdir/mnt/disk2/hadoop/data/ has already been used.
> 2020-04-22 19:32:25,950 INFO 
> org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage 
> directories for bpid BP-1558632922-172.16.2.238-1587535931027
> 2020-04-22 19:32:25,950 WARN 
> org.apache.hadoop.hdfs.server.common.Storage: Failed to analyze 
> storage directories for block pool 
> BP-1558632922-172.16.2.238-1587535931027
> java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead: 
> attempt to load an used block storage: 
> /vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027
>     at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:217)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:482)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:408) 
>
>     at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:587)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1397) 
>
>     at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1362) 
>
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:224)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) 
>
>     at java.lang.Thread.run(Thread.java:748)
>
> -- 
> Best Regards!
> Wenqi
>