You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/01/08 18:54:00 UTC

[jira] [Work logged] (HADOOP-17462) Hadoop Client getRpcResponse May Return Wrong Result

     [ https://issues.apache.org/jira/browse/HADOOP-17462?focusedWorklogId=533172&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-533172 ]

ASF GitHub Bot logged work on HADOOP-17462:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/Jan/21 18:53
            Start Date: 08/Jan/21 18:53
    Worklog Time Spent: 10m 
      Work Description: belugabehr opened a new pull request #2610:
URL: https://github.com/apache/hadoop/pull/2610


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 533172)
    Remaining Estimate: 0h
            Time Spent: 10m

> Hadoop Client getRpcResponse May Return Wrong Result
> ----------------------------------------------------
>
>                 Key: HADOOP-17462
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17462
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: common
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:java|Title=Client.java}
>   /** @return the rpc response or, in case of timeout, null. */
>   private Writable getRpcResponse(final Call call, final Connection connection,
>       final long timeout, final TimeUnit unit) throws IOException {
>     synchronized (call) {
>       while (!call.done) {
>         try {
>           AsyncGet.Util.wait(call, timeout, unit);
>           if (timeout >= 0 && !call.done) {
>             return null;
>           }
>         } catch (InterruptedException ie) {
>           Thread.currentThread().interrupt();
>           throw new InterruptedIOException("Call interrupted");
>         }
>       }
>  */
>   static class Call {
>     final int id;               // call id
>     final int retry;           // retry count
> ...
>     boolean done;               // true when call is done
> ...
> }
> {code}
> The {{done}} variable is not marked as {{volatile}} so the thread which is checking its status is free to cache the value and never reload it even though it is expected to change by a different thread.  The while loop may be stuck waiting for the change, but is always looking at a cached value.  If that happens, timeout will occur and then return 'null'.
> In previous versions of Hadoop, there was no time-out at this level, so it would cause endless loop.  Really tough error to track down if it happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org