You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "David Mollitor (Jira)" <ji...@apache.org> on 2021/01/08 18:49:00 UTC
[jira] [Created] (HADOOP-17462) Hadoop Client getRpcResponse May
Return Wrong Result
David Mollitor created HADOOP-17462:
---------------------------------------
Summary: Hadoop Client getRpcResponse May Return Wrong Result
Key: HADOOP-17462
URL: https://issues.apache.org/jira/browse/HADOOP-17462
Project: Hadoop Common
Issue Type: Improvement
Components: common
Reporter: David Mollitor
Assignee: David Mollitor
{code:java|Title=Client.java}
/** @return the rpc response or, in case of timeout, null. */
private Writable getRpcResponse(final Call call, final Connection connection,
final long timeout, final TimeUnit unit) throws IOException {
synchronized (call) {
while (!call.done) {
try {
AsyncGet.Util.wait(call, timeout, unit);
if (timeout >= 0 && !call.done) {
return null;
}
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
throw new InterruptedIOException("Call interrupted");
}
}
*/
static class Call {
final int id; // call id
final int retry; // retry count
...
boolean done; // true when call is done
...
}
{code}
The {{done}} variable is not marked as {{volatile}} so the thread which is checking its status is free to cache the value and never reload it even though it is expected to change by a different thread. The while loop may be stuck waiting for the change, but is always looking at a cached value.
In previous versions of Hadoop, there was no time-out at this level, so it would cause endless loop. Really tough error to track down if it happens.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org