You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ionut Ignatescu <io...@gmail.com> on 2013/10/02 19:15:47 UTC

HBase stucked because HDFS fails to replicate blocks

Hi,

I have a Hadoop&HBase cluster, that runs Hadoop 1.1.2 and HBase 0.94.7.
I notice an issue that stops normal cluster running.
My use case: I have several MR jobs that read data from one HBase table in
map phase and write data in 3 different tables during the reduce phase. I
create table handler on my own, I don't
TableOutputFormat. The only way out I found is to restart region server
deamon on region server with problems.

On namenode:
cat namenode.2013-10-02 | grep "blk_3136705509461132997_43329"
Wed Oct 02 13:32:17 2013 GMT namenode 3852-0@namenode:0 [INFO] (IPC Server
handler 29 on 22700) org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.allocateBlock:
/hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247.
blk_3136705509461132997_43329
Wed Oct 02 13:33:38 2013 GMT namenode 3852-0@namenode:0 [INFO] (IPC Server
handler 13 on 22700) org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
commitBlockSynchronization(lastblock=blk_3136705509461132997_43329,
newgenerationstamp=43366, newlength=40045568, newtargets=[10.81.18.101:50010],
closeFile=false, deleteBlock=false)

On region server:
cat regionserver.2013-10-02 | grep "1380720737247"
Wed Oct 02 13:32:17 2013 GMT regionserver 5854-0@datanode1:0 [INFO]
(regionserver60020.logRoller)
org.apache.hadoop.hbase.regionserver.wal.HLog: Roll
/hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720701436,
entries=149, filesize=63934833.  for
/hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247
Wed Oct 02 13:33:37 2013 GMT regionserver 5854-0@datanode1:0 [WARN]
(DataStreamer for file
/hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247
block blk_3136705509461132997_43329) org.apache.hadoop.hdfs.DFSClient:
Error Recovery for block blk_3136705509461132997_43329 bad datanode[0]
10.80.40.176:50010
Wed Oct 02 13:33:37 2013 GMT regionserver 5854-0@datanode1:0 [WARN]
(DataStreamer for file
/hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247
block blk_3136705509461132997_43329) org.apache.hadoop.hdfs.DFSClient:
Error Recovery for block blk_3136705509461132997_43329 in pipeline
10.80.40.176:50010, 10.81.111.8:50010, 10.81.18.101:50010: bad datanode
10.80.40.176:50010
Wed Oct 02 13:33:43 2013 GMT regionserver 5854-0@datanode1:0 [INFO]
(regionserver60020.logRoller) org.apache.hadoop.hdfs.DFSClient: Could not
complete file
/hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247
retrying...
Wed Oct 02 13:33:43 2013 GMT regionserver 5854-0@datanode1:0 [INFO]
(regionserver60020.logRoller) org.apache.hadoop.hdfs.DFSClient: Could not
complete file
/hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247
retrying...
Wed Oct 02 13:33:44 2013 GMT regionserver 5854-0@datanode1:0 [INFO]
(regionserver60020.logRoller) org.apache.hadoop.hdfs.DFSClient: Could not
complete file
/hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247
retrying...

cat regionserver.2013-10-02 | grep "1380720737247" | grep 'Could not
complete' | wc -l
5640


In datanode logs, that runs on the same host with region server:
cat datanode.2013-10-02 | grep "blk_3136705509461132997_43329"
Wed Oct 02 13:32:17 2013 GMT datanode 5651-0@datanode1:0 [INFO]
(org.apache.hadoop.hdfs.server.datanode.DataXceiver@ca6b1e3)
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_3136705509461132997_43329 src: /10.80.40.176:36721 dest: /
10.80.40.176:50010
Wed Oct 02 13:33:37 2013 GMT datanode 5651-0@datanode1:0 [INFO]
(org.apache.hadoop.hdfs.server.datanode.DataXceiver@ca6b1e3)
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.80.40.176:50010,
storageID=DS-812180968-10.80.40.176-50010-1380263000454, infoPort=50075,
ipcPort=50020): Exception writing block blk_3136705509461132997_43329 to
mirror 10.81.111.8:50010
Wed Oct 02 13:33:37 2013 GMT datanode 5651-0@datanode1:0 [INFO]
(org.apache.hadoop.hdfs.server.datanode.DataXceiver@ca6b1e3)
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock
for block blk_3136705509461132997_43329 java.io.IOException: Connection
reset by peer
Wed Oct 02 13:33:38 2013 GMT datanode 5651-0@datanode1:0 [INFO]
(PacketResponder 2 for Block blk_3136705509461132997_43329)
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_3136705509461132997_43329 2 : Thread is interrupted.
Wed Oct 02 13:33:38 2013 GMT datanode 5651-0@datanode1:0 [INFO]
(PacketResponder 2 for Block blk_3136705509461132997_43329)
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for
block blk_3136705509461132997_43329 terminating
Wed Oct 02 13:33:38 2013 GMT datanode 5651-0@datanode1:0 [INFO]
(org.apache.hadoop.hdfs.server.datanode.DataXceiver@ca6b1e3)
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_3136705509461132997_43329 received exception java.io.IOException:
Connection reset by peer

On datanode2, called by datanode1 to replicate logs:
cat datanode.2013-10-02 | grep "blk_3136705509461132997_43329"
Wed Oct 02 13:32:17 2013 GMT datanode 5986-0@datanode2:0 [INFO]
(org.apache.hadoop.hdfs.server.datanode.DataXceiver@6afc190b)
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_3136705509461132997_43329 src: /10.80.40.176:57279 dest: /
10.81.111.8:50010
Wed Oct 02 13:33:37 2013 GMT datanode 5986-0@datanode2:0 [INFO] (IPC Server
handler 18 on 50020) org.apache.hadoop.hdfs.server.datanode.DataNode:
Client calls recoverBlock(block=blk_3136705509461132997_43329, targets=[
10.81.111.8:50010, 10.81.18.101:50010])
Wed Oct 02 13:33:37 2013 GMT datanode 5986-0@datanode2:0 [WARN]
(PacketResponder 1 for Block blk_3136705509461132997_43329)
org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in
BlockReceiver.run():  { java.nio.channels.ClosedChannelException | at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135) |
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326) | at
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
| at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
| at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
| at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
| at java.io.DataOutputStream.writeLong(DataOutputStream.java:207) | at
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.write(DataTransferProtocol.java:135)
| at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:939)
| at java.lang.Thread.run(Thread.java:662) }
Wed Oct 02 13:33:37 2013 GMT datanode 5986-0@datanode2:0 [INFO]
(PacketResponder 1 for Block blk_3136705509461132997_43329)
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
blk_3136705509461132997_43329 1 Exception
java.nio.channels.ClosedChannelException
Wed Oct 02 13:33:37 2013 GMT datanode 5986-0@datanode2:0 [INFO]
(org.apache.hadoop.hdfs.server.datanode.DataXceiver@6afc190b)
org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock
for block blk_3136705509461132997_43329
java.nio.channels.ClosedByInterruptException
Wed Oct 02 13:33:37 2013 GMT datanode 5986-0@datanode2:0 [INFO]
(org.apache.hadoop.hdfs.server.datanode.DataXceiver@6afc190b)
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_3136705509461132997_43329 received exception java.io.IOException:
Interrupted receiveBlock
Wed Oct 02 13:33:38 2013 GMT datanode 5986-0@datanode2:0 [INFO]
(PacketResponder 1 for Block blk_3136705509461132997_43329)
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for
block blk_3136705509461132997_43329 terminating
Wed Oct 02 13:33:38 2013 GMT datanode 5986-0@datanode2:0 [WARN] (IPC Server
handler 18 on 50020)
org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to
getBlockMetaDataInfo for block (=blk_3136705509461132997_43329) from
datanode (=10.81.111.8:50010) { java.io.IOException: Block
blk_3136705509461132997_43329 length is 39980544 does not match block file
length 40045568 | at
org.apache.hadoop.hdfs.server.datanode.FSDataset.validateBlockMetadata(FSDataset.java:1861)
| at
org.apache.hadoop.hdfs.server.datanode.FSDataset.startBlockRecovery(FSDataset.java:2166)
| at
org.apache.hadoop.hdfs.server.datanode.DataNode.startBlockRecovery(DataNode.java:1783)
| at
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1972)
| at
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2114)
| at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) | at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
| at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
| at java.lang.reflect.Method.invoke(Method.java:597) | at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) | at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) | at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) | at
java.security.AccessController.doPrivileged(Native Method) | at
javax.security.auth.Subject.doAs(Subject.java:396) | at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
| at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) }

For me, this sounds like a HDFS issue, but I cannot find the root cause.
Any idea is welcome.
I checked OS file descriptors, HDFS file xcieves, anything looks ok.

Thanks in advance,
Ionut I.

Re: HBase stucked because HDFS fails to replicate blocks

Posted by Jean-Daniel Cryans <jd...@apache.org>.
I like the way you were able to dig down into multiple logs and present us
the information, but it looks more like GC than an HDFS failure. In your
region server log, go back to the first FATAL and see if it got a session
expired from ZK and other messages like a client not being able to talk to
a server for some amount of time. If it's the case then what you are seeing
is the result of IO fencing by the master.

J-D


On Wed, Oct 2, 2013 at 10:15 AM, Ionut Ignatescu
<io...@gmail.com>wrote:

> Hi,
>
> I have a Hadoop&HBase cluster, that runs Hadoop 1.1.2 and HBase 0.94.7.
> I notice an issue that stops normal cluster running.
> My use case: I have several MR jobs that read data from one HBase table in
> map phase and write data in 3 different tables during the reduce phase. I
> create table handler on my own, I don't
> TableOutputFormat. The only way out I found is to restart region server
> deamon on region server with problems.
>
> On namenode:
> cat namenode.2013-10-02 | grep "blk_3136705509461132997_43329"
> Wed Oct 02 13:32:17 2013 GMT namenode 3852-0@namenode:0 [INFO] (IPC Server
> handler 29 on 22700) org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.allocateBlock:
>
> /hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247.
> blk_3136705509461132997_43329
> Wed Oct 02 13:33:38 2013 GMT namenode 3852-0@namenode:0 [INFO] (IPC Server
> handler 13 on 22700) org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> commitBlockSynchronization(lastblock=blk_3136705509461132997_43329,
> newgenerationstamp=43366, newlength=40045568, newtargets=[
> 10.81.18.101:50010],
> closeFile=false, deleteBlock=false)
>
> On region server:
> cat regionserver.2013-10-02 | grep "1380720737247"
> Wed Oct 02 13:32:17 2013 GMT regionserver 5854-0@datanode1:0 [INFO]
> (regionserver60020.logRoller)
> org.apache.hadoop.hbase.regionserver.wal.HLog: Roll
>
> /hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720701436,
> entries=149, filesize=63934833.  for
>
> /hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247
> Wed Oct 02 13:33:37 2013 GMT regionserver 5854-0@datanode1:0 [WARN]
> (DataStreamer for file
>
> /hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247
> block blk_3136705509461132997_43329) org.apache.hadoop.hdfs.DFSClient:
> Error Recovery for block blk_3136705509461132997_43329 bad datanode[0]
> 10.80.40.176:50010
> Wed Oct 02 13:33:37 2013 GMT regionserver 5854-0@datanode1:0 [WARN]
> (DataStreamer for file
>
> /hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247
> block blk_3136705509461132997_43329) org.apache.hadoop.hdfs.DFSClient:
> Error Recovery for block blk_3136705509461132997_43329 in pipeline
> 10.80.40.176:50010, 10.81.111.8:50010, 10.81.18.101:50010: bad datanode
> 10.80.40.176:50010
> Wed Oct 02 13:33:43 2013 GMT regionserver 5854-0@datanode1:0 [INFO]
> (regionserver60020.logRoller) org.apache.hadoop.hdfs.DFSClient: Could not
> complete file
>
> /hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247
> retrying...
> Wed Oct 02 13:33:43 2013 GMT regionserver 5854-0@datanode1:0 [INFO]
> (regionserver60020.logRoller) org.apache.hadoop.hdfs.DFSClient: Could not
> complete file
>
> /hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247
> retrying...
> Wed Oct 02 13:33:44 2013 GMT regionserver 5854-0@datanode1:0 [INFO]
> (regionserver60020.logRoller) org.apache.hadoop.hdfs.DFSClient: Could not
> complete file
>
> /hbase/.logs/datanode1,60020,1380637389766/datanode1%2C60020%2C1380637389766.1380720737247
> retrying...
>
> cat regionserver.2013-10-02 | grep "1380720737247" | grep 'Could not
> complete' | wc -l
> 5640
>
>
> In datanode logs, that runs on the same host with region server:
> cat datanode.2013-10-02 | grep "blk_3136705509461132997_43329"
> Wed Oct 02 13:32:17 2013 GMT datanode 5651-0@datanode1:0 [INFO]
> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@ca6b1e3)
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_3136705509461132997_43329 src: /10.80.40.176:36721 dest: /
> 10.80.40.176:50010
> Wed Oct 02 13:33:37 2013 GMT datanode 5651-0@datanode1:0 [INFO]
> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@ca6b1e3)
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.80.40.176:50010,
> storageID=DS-812180968-10.80.40.176-50010-1380263000454, infoPort=50075,
> ipcPort=50020): Exception writing block blk_3136705509461132997_43329 to
> mirror 10.81.111.8:50010
> Wed Oct 02 13:33:37 2013 GMT datanode 5651-0@datanode1:0 [INFO]
> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@ca6b1e3)
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock
> for block blk_3136705509461132997_43329 java.io.IOException: Connection
> reset by peer
> Wed Oct 02 13:33:38 2013 GMT datanode 5651-0@datanode1:0 [INFO]
> (PacketResponder 2 for Block blk_3136705509461132997_43329)
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
> blk_3136705509461132997_43329 2 : Thread is interrupted.
> Wed Oct 02 13:33:38 2013 GMT datanode 5651-0@datanode1:0 [INFO]
> (PacketResponder 2 for Block blk_3136705509461132997_43329)
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for
> block blk_3136705509461132997_43329 terminating
> Wed Oct 02 13:33:38 2013 GMT datanode 5651-0@datanode1:0 [INFO]
> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@ca6b1e3)
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_3136705509461132997_43329 received exception java.io.IOException:
> Connection reset by peer
>
> On datanode2, called by datanode1 to replicate logs:
> cat datanode.2013-10-02 | grep "blk_3136705509461132997_43329"
> Wed Oct 02 13:32:17 2013 GMT datanode 5986-0@datanode2:0 [INFO]
> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@6afc190b)
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_3136705509461132997_43329 src: /10.80.40.176:57279 dest: /
> 10.81.111.8:50010
> Wed Oct 02 13:33:37 2013 GMT datanode 5986-0@datanode2:0 [INFO] (IPC
> Server
> handler 18 on 50020) org.apache.hadoop.hdfs.server.datanode.DataNode:
> Client calls recoverBlock(block=blk_3136705509461132997_43329, targets=[
> 10.81.111.8:50010, 10.81.18.101:50010])
> Wed Oct 02 13:33:37 2013 GMT datanode 5986-0@datanode2:0 [WARN]
> (PacketResponder 1 for Block blk_3136705509461132997_43329)
> org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in
> BlockReceiver.run():  { java.nio.channels.ClosedChannelException | at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:135) |
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:326) | at
>
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
> | at
>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> | at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
> | at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
> | at java.io.DataOutputStream.writeLong(DataOutputStream.java:207) | at
>
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.write(DataTransferProtocol.java:135)
> | at
>
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:939)
> | at java.lang.Thread.run(Thread.java:662) }
> Wed Oct 02 13:33:37 2013 GMT datanode 5986-0@datanode2:0 [INFO]
> (PacketResponder 1 for Block blk_3136705509461132997_43329)
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder
> blk_3136705509461132997_43329 1 Exception
> java.nio.channels.ClosedChannelException
> Wed Oct 02 13:33:37 2013 GMT datanode 5986-0@datanode2:0 [INFO]
> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@6afc190b)
> org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock
> for block blk_3136705509461132997_43329
> java.nio.channels.ClosedByInterruptException
> Wed Oct 02 13:33:37 2013 GMT datanode 5986-0@datanode2:0 [INFO]
> (org.apache.hadoop.hdfs.server.datanode.DataXceiver@6afc190b)
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_3136705509461132997_43329 received exception java.io.IOException:
> Interrupted receiveBlock
> Wed Oct 02 13:33:38 2013 GMT datanode 5986-0@datanode2:0 [INFO]
> (PacketResponder 1 for Block blk_3136705509461132997_43329)
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 1 for
> block blk_3136705509461132997_43329 terminating
> Wed Oct 02 13:33:38 2013 GMT datanode 5986-0@datanode2:0 [WARN] (IPC
> Server
> handler 18 on 50020)
> org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to
> getBlockMetaDataInfo for block (=blk_3136705509461132997_43329) from
> datanode (=10.81.111.8:50010) { java.io.IOException: Block
> blk_3136705509461132997_43329 length is 39980544 does not match block file
> length 40045568 | at
>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.validateBlockMetadata(FSDataset.java:1861)
> | at
>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.startBlockRecovery(FSDataset.java:2166)
> | at
>
> org.apache.hadoop.hdfs.server.datanode.DataNode.startBlockRecovery(DataNode.java:1783)
> | at
>
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1972)
> | at
>
> org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:2114)
> | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) | at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> | at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> | at java.lang.reflect.Method.invoke(Method.java:597) | at
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) | at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) | at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) | at
> java.security.AccessController.doPrivileged(Native Method) | at
> javax.security.auth.Subject.doAs(Subject.java:396) | at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
> | at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) }
>
> For me, this sounds like a HDFS issue, but I cannot find the root cause.
> Any idea is welcome.
> I checked OS file descriptors, HDFS file xcieves, anything looks ok.
>
> Thanks in advance,
> Ionut I.
>