You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2019/05/31 20:51:01 UTC

[jira] [Reopened] (HADOOP-16266) Add more fine-grained processing time metrics to the RPC layer

     [ https://issues.apache.org/jira/browse/HADOOP-16266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran reopened HADOOP-16266:
-------------------------------------

This seems to be the patch which is breaking my test runs on a PR now rebased to trunk

{{org.apache.hadoop.fs.s3a.commit.staging.integration.ITestDirectoryCommitMRJob}} works by bringing up a miniYarn cluster, except now it doesn't. Instead I get a stack trace (shortened)
{code}
2019-05-31 21:31:05,544 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,554 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,554 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,554 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,554 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,554 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,554 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,555 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,555 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,542 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,555 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,556 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,556 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,557 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,557 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,557 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,557 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,557 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,557 [IPC Server Responder] WARN  ipc.Server (Server.java:doRunLoop(1546)) - Exception in Responder
java.io.IOException: Invalid argument
	at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
	at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
	at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at org.apache.hadoop.ipc.Server$Responder.doRunLoop(Server.java:1485)
	at org.apache.hadoop.ipc.Server$Responder.run(Server.java:1468)
2019-05-31 21:31:05,557 [IPC Server Responder] WARN  ipc.Server (Server
{code}

* This is happening in arg validation in the select() call, which is being caught, swallowed and repeated, forever.
* Searching for "java.io.IOException: Invalid argument" + select I see this surfaces when the timeout is too big [https://bugs.openjdk.java.net/browse/JDK-8165000]
* And looking at our code, I can see this happening in writeSelector.select(PURGE_INTERVAL_NANOS). 
* which is a method that takes milliseconds, not nanos.

This leads to a hypothesis "this patch passes a timeout 10e6 times too big, the native code rejects and then server just retries forever hoping it will somehow recover"

I'm reopening and making it your homework to fix. Sorry.

> Add more fine-grained processing time metrics to the RPC layer
> --------------------------------------------------------------
>
>                 Key: HADOOP-16266
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16266
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>            Reporter: Christopher Gregorian
>            Assignee: Christopher Gregorian
>            Priority: Minor
>              Labels: rpc
>             Fix For: 3.3.0
>
>         Attachments: HADOOP-16266.001.patch, HADOOP-16266.002.patch, HADOOP-16266.003.patch, HADOOP-16266.004.patch, HADOOP-16266.005.patch, HADOOP-16266.006.patch, HADOOP-16266.007.patch, HADOOP-16266.008.patch, HADOOP-16266.009.patch, HADOOP-16266.010.patch, HADOOP-16266.011.patch
>
>
> Splitting off of HDFS-14403 to track the first part: introduces more fine-grained measuring of how a call's processing time is split up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org