You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Wenqi Ma <ma...@gmail.com> on 2020/04/22 11:50:39 UTC
Datanode cannot start
Dear Hadoop community,
A strange problem happened in our fresh installed hdfs cluster. The
datanode cannot start because two threads tried to load storage, the latter
would fail and lead a FATAL error.
As the following log shown, the blue text
means DataStorage::addStorageLocations is called, the red text are I added
to find the thread details.
I compared the log with normal datanode's in another cluster, and there is
only one line (blue text).
Any suggestion would be appreciated, thanks!
2020-04-22 19:32:25,666 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request received
for nameservices: nameservice1
2020-04-22 19:32:25,718 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Starting BPOfferServices
for nameservices: nameservice1
2020-04-22 19:32:25,726 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering>
(Datanode Uuid unassigned) service to node238/172.16.2.238:8020 starting to
offer service
2020-04-22 19:32:25,727 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool <registering>
(Datanode Uuid unassigned) service to node239/172.16.2.239:8020 starting to
offer service
2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 50020: starting
2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2020-04-22 19:32:25,885 INFO org.apache.hadoop.hdfs.server.common.Storage:
kkk current thread ID: 30
2020-04-22 19:32:25,885 INFO org.apache.hadoop.hdfs.server.common.Storage:
kkk current thread: Thread[DataNode:
[[[DISK]file:/vdir/mnt/disk2/hadoop/data/]] heartbeating to node239/
172.16.2.239:8020,5,main]
2020-04-22 19:32:25,885 INFO
org.apache.hadoop.hdfs.server.common.Storage: Using
1 threads to upgrade data directories
(dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
2020-04-22 19:32:25,890 INFO org.apache.hadoop.hdfs.server.common.Storage:
Lock on /vdir/mnt/disk2/hadoop/data/in_use.lock acquired by nodename
35@node238
2020-04-22 19:32:25,918 INFO org.apache.hadoop.hdfs.server.common.Storage:
Analyzing storage directories for bpid
BP-1558632922-172.16.2.238-1587535931027
2020-04-22 19:32:25,918 INFO org.apache.hadoop.hdfs.server.common.Storage:
Locking is disabled for
/vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027
2020-04-22 19:32:25,920 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage:
nsid=613035685;bpid=BP-1558632922-172.16.2.238-1587535931027;lv=-56;nsInfo=lv=-63;cid=hdfs1;nsid=613035685;c=0;bpid=BP-1558632922-172.16.2.238-1587535931027;dnuuid=0b14101d-f4f7-4004-9eaf-ffd157d4aefe
2020-04-22 19:32:25,931 INFO org.apache.hadoop.hdfs.server.common.Storage:
kkk current thread ID: 29
2020-04-22 19:32:25,931 INFO org.apache.hadoop.hdfs.server.common.Storage:
kkk current thread: Thread[DataNode:
[[[DISK]file:/vdir/mnt/disk2/hadoop/data/]] heartbeating to node238/
172.16.2.238:8020,5,main]
2020-04-22 19:32:25,931 INFO
org.apache.hadoop.hdfs.server.common.Storage: Using
1 threads to upgrade data directories
(dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
2020-04-22 19:32:25,932 INFO org.apache.hadoop.hdfs.server.common.Storage:
Storage directory [DISK]file:/vdir/mnt/disk2/hadoop/data/ has already been
used.
2020-04-22 19:32:25,950 INFO org.apache.hadoop.hdfs.server.common.Storage:
Analyzing storage directories for bpid
BP-1558632922-172.16.2.238-1587535931027
2020-04-22 19:32:25,950 WARN org.apache.hadoop.hdfs.server.common.Storage:
Failed to analyze storage directories for block pool
BP-1558632922-172.16.2.238-1587535931027
java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead: attempt
to load an used block storage:
/vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027
at
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:217)
at
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:482)
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:408)
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:587)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1397)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1362)
at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:224)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
at java.lang.Thread.run(Thread.java:748)
--
Best Regards!
Wenqi
Re: Datanode cannot start
Posted by Gurmukh Singh <gu...@yahoo.com.INVALID>.
Hi Wenqi,
Which version of hadoop are you using? (My initial guess is 3.X, as )
Can you see anything in the datanode logs on the lines of
"org.apache.hadoop.hdfs.server.common.InconsistentFSStateException"
Although, you mention that it is new cluster, make sure the ClusterID
matches. --> "attempt to load an used block storage:
/vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027"
On 22/4/20 9:50 pm, Wenqi Ma wrote:
> Dear Hadoop community,
>
> A strange problem happened in our fresh installed hdfs cluster. The
> datanode cannot start because two threads tried to load storage, the
> latter would fail and lead a FATAL error.
> As the following log shown, the blue text
> means DataStorage::addStorageLocations is called, the red text are I
> added to find the thread details.
> I compared the log with normal datanode's in another cluster, and
> there is only one line (blue text).
>
> Any suggestion would be appreciated, thanks!
>
> 2020-04-22 19:32:25,666 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Refresh request
> received for nameservices: nameservice1
> 2020-04-22 19:32:25,718 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Starting
> BPOfferServices for nameservices: nameservice1
> 2020-04-22 19:32:25,726 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
> <registering> (Datanode Uuid unassigned) service to
> node238/172.16.2.238:8020 <http://172.16.2.238:8020> starting to offer
> service
> 2020-04-22 19:32:25,727 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool
> <registering> (Datanode Uuid unassigned) service to
> node239/172.16.2.239:8020 <http://172.16.2.239:8020> starting to offer
> service
> 2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 50020: starting
> 2020-04-22 19:32:25,734 INFO org.apache.hadoop.ipc.Server: IPC Server
> Responder: starting
> 2020-04-22 19:32:25,885 INFO
> org.apache.hadoop.hdfs.server.common.Storage: kkk current thread ID: 30
> 2020-04-22 19:32:25,885 INFO
> org.apache.hadoop.hdfs.server.common.Storage: kkk current thread:
> Thread[DataNode: [[[DISK]file:/vdir/mnt/disk2/hadoop/data/]]
> heartbeating to node239/172.16.2.239:8020
> <http://172.16.2.239:8020>,5,main]
> 2020-04-22 19:32:25,885 INFO
> org.apache.hadoop.hdfs.server.common.Storage: Using 1 threads to
> upgrade data directories
> (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
> 2020-04-22 19:32:25,890 INFO
> org.apache.hadoop.hdfs.server.common.Storage: Lock on
> /vdir/mnt/disk2/hadoop/data/in_use.lock acquired by nodename 35@node238
> 2020-04-22 19:32:25,918 INFO
> org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage
> directories for bpid BP-1558632922-172.16.2.238-1587535931027
> 2020-04-22 19:32:25,918 INFO
> org.apache.hadoop.hdfs.server.common.Storage: Locking is disabled for
> /vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027
> 2020-04-22 19:32:25,920 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage:
> nsid=613035685;bpid=BP-1558632922-172.16.2.238-1587535931027;lv=-56;nsInfo=lv=-63;cid=hdfs1;nsid=613035685;c=0;bpid=BP-1558632922-172.16.2.238-1587535931027;dnuuid=0b14101d-f4f7-4004-9eaf-ffd157d4aefe
> 2020-04-22 19:32:25,931 INFO
> org.apache.hadoop.hdfs.server.common.Storage: kkk current thread ID: 29
> 2020-04-22 19:32:25,931 INFO
> org.apache.hadoop.hdfs.server.common.Storage: kkk current thread:
> Thread[DataNode: [[[DISK]file:/vdir/mnt/disk2/hadoop/data/]]
> heartbeating to node238/172.16.2.238:8020
> <http://172.16.2.238:8020>,5,main]
> 2020-04-22 19:32:25,931 INFO
> org.apache.hadoop.hdfs.server.common.Storage: Using 1 threads to
> upgrade data directories
> (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
> 2020-04-22 19:32:25,932 INFO
> org.apache.hadoop.hdfs.server.common.Storage: Storage directory
> [DISK]file:/vdir/mnt/disk2/hadoop/data/ has already been used.
> 2020-04-22 19:32:25,950 INFO
> org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage
> directories for bpid BP-1558632922-172.16.2.238-1587535931027
> 2020-04-22 19:32:25,950 WARN
> org.apache.hadoop.hdfs.server.common.Storage: Failed to analyze
> storage directories for block pool
> BP-1558632922-172.16.2.238-1587535931027
> java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead:
> attempt to load an used block storage:
> /vdir/mnt/disk2/hadoop/data/current/BP-1558632922-172.16.2.238-1587535931027
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:217)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:250)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.loadBlockPoolSliceStorage(DataStorage.java:482)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:408)
>
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:587)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1397)
>
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1362)
>
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:224)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850)
>
> at java.lang.Thread.run(Thread.java:748)
>
> --
> Best Regards!
> Wenqi
>