You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Colin Patrick McCabe (JIRA)" <ji...@apache.org> on 2015/04/09 00:18:12 UTC

[jira] [Comment Edited] (HADOOP-11802) DomainSocketWatcher#watcherThread can encounter IllegalStateException in finally block when calling sendCallback

    [ https://issues.apache.org/jira/browse/HADOOP-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486155#comment-14486155 ] 

Colin Patrick McCabe edited comment on HADOOP-11802 at 4/8/15 10:18 PM:
------------------------------------------------------------------------

I thought about this a little bit more, and I wonder whether this finally block inside requestShortCircuitShm is causing a "double removal":

{code}
  public void requestShortCircuitShm(String clientName) throws IOException {                                             
    NewShmInfo shmInfo = null;                                                                                           
    boolean success = false;                                                                                             
    DomainSocket sock = peer.getDomainSocket();                                                                          
    try {                                                                                                                
...
    } finally {                                                                                                          
...
      if ((!success) && (peer == null)) {
        // If we failed to pass the shared memory segment to the client,                                                 
        // close the UNIX domain socket now.  This will trigger the                                                      
        // DomainSocketWatcher callback, cleaning up the segment.                                                        
        IOUtils.cleanup(null, sock);                                                                                     
      }
      IOUtils.cleanup(null, shmInfo);                                                                                    
    }                                                                                                                    
{code}

Closing the socket will remove that shmID, but so will closing the NewShmInfo object... let me look into this.

[edit: NewShmInfo#close just closes the shared memory segment, but not the domain socket.  Since DomainSocketWatcher is watching the domain socket rather than the shm fd, doing both close operations should not be a problem.]


was (Author: cmccabe):
I thought about this a little bit more, and I wonder whether this finally block inside requestShortCircuitShm is causing a "double removal":

{code}
  public void requestShortCircuitShm(String clientName) throws IOException {                                             
    NewShmInfo shmInfo = null;                                                                                           
    boolean success = false;                                                                                             
    DomainSocket sock = peer.getDomainSocket();                                                                          
    try {                                                                                                                
...
    } finally {                                                                                                          
...
      if ((!success) && (peer == null)) {
        // If we failed to pass the shared memory segment to the client,                                                 
        // close the UNIX domain socket now.  This will trigger the                                                      
        // DomainSocketWatcher callback, cleaning up the segment.                                                        
        IOUtils.cleanup(null, sock);                                                                                     
      }
      IOUtils.cleanup(null, shmInfo);                                                                                    
    }                                                                                                                    
{code}

Closing the socket will remove that shmID, but so will closing the NewShmInfo object... let me look into this.

> DomainSocketWatcher#watcherThread can encounter IllegalStateException in finally block when calling sendCallback
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11802
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11802
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>
> In the main finally block of the {{DomainSocketWatcher#watcherThread}}, the call to {{sendCallback}} can encounter an {{IllegalStateException}}, and leave some cleanup tasks undone.
> {code}
>       } finally {
>         lock.lock();
>         try {
>           kick(); // allow the handler for notificationSockets[0] to read a byte
>           for (Entry entry : entries.values()) {
>             // We do not remove from entries as we iterate, because that can
>             // cause a ConcurrentModificationException.
>             sendCallback("close", entries, fdSet, entry.getDomainSocket().fd);
>           }
>           entries.clear();
>           fdSet.close();
>         } finally {
>           lock.unlock();
>         }
>       }
> {code}
> The exception causes {{watcherThread}} to skip the calls to {{entries.clear()}} and {{fdSet.close()}}.
> {code}
> 2015-04-02 11:48:09,941 [DataXceiver for client unix:/home/gs/var/run/hdfs/dn_socket [Waiting for operation #1]] INFO DataNode.clienttrace: cliID: DFSClient_NONMAPREDUCE_-807148576_1, src: 127.0.0.1, dest: 127.0.0.1, op: REQUEST_SHORT_CIRCUIT_SHM, shmId: n/a, srvID: e6b6cdd7-1bf8-415f-a412-32d8493554df, success: false
> 2015-04-02 11:48:09,941 [Thread-14] ERROR unix.DomainSocketWatcher: Thread[Thread-14,5,main] terminating on unexpected exception
> java.lang.IllegalStateException: failed to remove b845649551b6b1eab5c17f630e42489d
>         at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>         at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.removeShm(ShortCircuitRegistry.java:119)
>         at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry$RegisteredShm.handle(ShortCircuitRegistry.java:102)
>         at org.apache.hadoop.net.unix.DomainSocketWatcher.sendCallback(DomainSocketWatcher.java:402)
>         at org.apache.hadoop.net.unix.DomainSocketWatcher.access$1100(DomainSocketWatcher.java:52)
>         at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:522)
>         at java.lang.Thread.run(Thread.java:722)
> {code}
> Please note that this is not a duplicate of HADOOP-11333, HADOOP-11604, or HADOOP-10404. The cluster installation is running code with all of these fixes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)