You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2011/03/05 00:44:45 UTC
[jira] Commented: (HADOOP-6762) exception while doing RPC I/O
closes channel
[ https://issues.apache.org/jira/browse/HADOOP-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002871#comment-13002871 ]
Todd Lipcon commented on HADOOP-6762:
-------------------------------------
Hi Sam,
We saw the following deadlock which I think is related to this patch:
{noformat}
Thread 50994 (IPC Client (47) connection to XXXXXXXXX:8020 from hdfs):
State: BLOCKED
Blocked count: 7168
Waited count: 7122
Blocked on java.io.DataOutputStream@2e932fec
Blocked by 50828 (sendParams-14)
Stack:
org.apache.hadoop.ipc.Client$Connection.sendPing(Client.java:676)
org.apache.hadoop.ipc.Client$Connection.access$400(Client.java:210)
org.apache.hadoop.ipc.Client$Connection$PingInputStream.handleTimeout(Client.java:340)
org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:370)
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
java.io.BufferedInputStream.read(BufferedInputStream.java:237)
java.io.DataInputStream.readInt(DataInputStream.java:370)
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:781)
org.apache.hadoop.ipc.Client$Connection.run(Client.java:689)
Thread 50828 (sendParams-14):
State: BLOCKED
Blocked count: 54
Waited count: 8313
Blocked on org.apache.hadoop.ipc.Client$Connection@54aeb3dd
Blocked by 50994 (IPC Client (47) connection to XXXXXX:8020 from hdfs)
Stack:
org.apache.hadoop.ipc.Client$Connection.markClosed(Client.java:809)
org.apache.hadoop.ipc.Client$Connection.access$1200(Client.java:210)
org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:745)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
java.util.concurrent.FutureTask.run(FutureTask.java:138)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:619)
{noformat}
The issue is that in the patch, we have the following inverted lock orders:
sendParam's senderFuture: Connection.out -> Connection (in markClosed)
sendPing: Connection -> Connection.out (explicit sync)
Have you guys seen this issue?
> exception while doing RPC I/O closes channel
> --------------------------------------------
>
> Key: HADOOP-6762
> URL: https://issues.apache.org/jira/browse/HADOOP-6762
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 0.20.2
> Reporter: sam rash
> Assignee: sam rash
> Priority: Critical
> Fix For: 0.22.0
>
> Attachments: hadoop-6762-1.txt, hadoop-6762-10.txt, hadoop-6762-2.txt, hadoop-6762-3.txt, hadoop-6762-4.txt, hadoop-6762-6.txt, hadoop-6762-7.txt, hadoop-6762-8.txt, hadoop-6762-9.txt
>
>
> If a single process creates two unique fileSystems to the same NN using FileSystem.newInstance(), and one of them issues a close(), the leasechecker thread is interrupted. This interrupt races with the rpc namenode.renew() and can cause a ClosedByInterruptException. This closes the underlying channel and the other filesystem, sharing the connection will get errors.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira