You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Koji Noguchi (JIRA)" <ji...@apache.org> on 2008/07/02 02:30:44 UTC

[jira] Created: (HADOOP-3681) Infinite loop in dfs close

Infinite loop in dfs close
--------------------------

                 Key: HADOOP-3681
                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
             Project: Hadoop Core
          Issue Type: Bug
    Affects Versions: 0.17.0
            Reporter: Koji Noguchi


We had dfsClient -put  hang outputting 

{noformat}
2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
timed out waiting for rpc response
2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
/_temporary/_task_200806262325_4136_r_000408_0/part-00408
retrying...
2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
/_temporary/_task_200806262325_4136_r_000408_0/part-00408
retrying...
2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
/_temporary/_task_200806262325_4136_r_000408_0/part-00408
retrying...
2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
/_temporary/_task_200806262325_4136_r_000408_0/part-00408
retrying...
2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
/_temporary/_task_200806262325_4136_r_000408_0/part-00408
retrying...
2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
/_temporary/_task_200806262325_4136_r_000408_0/part-00408
retrying...
[repeats forever]
{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3681) Infinite loop in dfs close

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610420#action_12610420 ] 

Koji Noguchi commented on HADOOP-3681:
--------------------------------------

+1 Tested manually (throwing IOException in the middle).

With second patch
{noformat}
bash-3.00$ ls -l testfile
-rw-r--r--  1 knoguchi users 75396 Jul  4 01:46 testfile
bash-3.00$ $HADOOP_HOME/bin/hadoop dfs -put testfile  /user/knoguchi
08/07/04 02:13:43 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException: testing
08/07/04 02:13:43 WARN dfs.DFSClient: Error Recovery for block null bad datanode[0]
bash-3.00$ echo $?
0
bash-3.00$ $HADOOP_HOME/bin/hadoop dfs -ls /user/knoguchi
Found 1 items
/user/knoguchi/testfile <r 1>   0       2008-07-04 02:13        rw-r--r--       knoguchi        supergroup
{noformat}

With third patch, 
{noformat}
bash-3.00$ $HADOOP_HOME/bin/hadoop dfs -put testfile  /user/knoguchi
08/07/04 02:15:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException: testing
08/07/04 02:15:03 WARN dfs.DFSClient: Error Recovery for block null bad datanode[0]
put: Could not get block locations. Aborting...
Exception closing file /user/knoguchi/testfile
java.io.IOException: Could not get block locations. Aborting...
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2084)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1822)
bash-3.00$ echo $?
255
{noformat}


> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>             Fix For: 0.17.1, 0.18.0
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch, HADOOP-3681-3-17.patch, HADOOP-3681-3-18.patch, HADOOP-3681-3-19.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3681) Infinite loop in dfs close

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-3681:
------------------------------------

    Priority: Blocker  (was: Major)

Should have been promotedearlier.

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>            Priority: Blocker
>             Fix For: 0.17.2, 0.18.0
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch, HADOOP-3681-3-17.patch, HADOOP-3681-3-18.patch, HADOOP-3681-3-19.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3681) Infinite loop in dfs close

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609754#action_12609754 ] 

Koji Noguchi commented on HADOOP-3681:
--------------------------------------

Stack trace of the close.
DataStreamer thread doesn't show up on jstack.

{noformat}
"main" prio=10 tid=0x0805a800 nid=0x17a1 waiting on condition [0xf7e6c000..0xf7e6d1f8]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:2658)
        - locked <0xd524fc08> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:2576)
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:59)
        at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:79)
        at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:94)
        - locked <0xd524fce0> (a org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:398)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
{noformat}

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3681) Infinite loop in dfs close

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lohit Vijayarenu updated HADOOP-3681:
-------------------------------------

    Attachment: HADOOP-3681-3-18.patch

Yes, there could be an exception thrown before locateFollowingBlock. So, we have to check lastException before setting close to true by calling isClosed(). As you said after flush is the right place. This updated patch also changes one testcase which used to dump the exception on stdout. 

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>             Fix For: 0.17.1, 0.18.0
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch, HADOOP-3681-3-18.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3681) Infinite loop in dfs close

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-3681:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>            Priority: Blocker
>             Fix For: 0.17.2, 0.18.0
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch, HADOOP-3681-3-17.patch, HADOOP-3681-3-18.patch, HADOOP-3681-3-19.patch, HADOOP-3681-3-19.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3681) Infinite loop in dfs close

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611249#action_12611249 ] 

dhruba borthakur commented on HADOOP-3681:
------------------------------------------

The third patch looks good. +1.

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>             Fix For: 0.17.1, 0.18.0
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch, HADOOP-3681-3-17.patch, HADOOP-3681-3-18.patch, HADOOP-3681-3-19.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3681) Infinite loop in dfs close

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610268#action_12610268 ] 

Koji Noguchi commented on HADOOP-3681:
--------------------------------------

Thanks Lohit! 

In the DataStreamer thread, can we catch Throwable instead of IOException? 
If OutOfMemoryError is thrown, it'll still hang.

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3681) Infinite loop in dfs close

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624797#action_12624797 ] 

Hudson commented on HADOOP-3681:
--------------------------------

Integrated in Hadoop-trunk #581 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/])

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>            Priority: Blocker
>             Fix For: 0.17.2
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch, HADOOP-3681-3-17.patch, HADOOP-3681-3-18.patch, HADOOP-3681-3-19.patch, HADOOP-3681-3-19.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3681) Infinite loop in dfs close

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611776#action_12611776 ] 

Raghu Angadi commented on HADOOP-3681:
--------------------------------------

I just committed this. Thanks Lohit!

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>            Priority: Blocker
>             Fix For: 0.17.2, 0.18.0
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch, HADOOP-3681-3-17.patch, HADOOP-3681-3-18.patch, HADOOP-3681-3-19.patch, HADOOP-3681-3-19.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3681) Infinite loop in dfs close

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Noguchi updated HADOOP-3681:
---------------------------------

    Component/s: dfs

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3681) Infinite loop in dfs close

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lohit Vijayarenu updated HADOOP-3681:
-------------------------------------

    Status: Patch Available  (was: Open)

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>            Priority: Blocker
>             Fix For: 0.17.2, 0.18.0
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch, HADOOP-3681-3-17.patch, HADOOP-3681-3-18.patch, HADOOP-3681-3-19.patch, HADOOP-3681-3-19.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3681) Infinite loop in dfs close

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lohit Vijayarenu updated HADOOP-3681:
-------------------------------------

    Status: Open  (was: Patch Available)

Looks like hudson is not running core tests. I see same failure on another patch as well. 

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>             Fix For: 0.17.1, 0.18.0
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3681) Infinite loop in dfs close

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611659#action_12611659 ] 

Hadoop QA commented on HADOOP-3681:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12385262/HADOOP-3681-3-17.patch
  against trunk revision 674834.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2812/console

This message is automatically generated.

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>            Priority: Blocker
>             Fix For: 0.17.2, 0.18.0
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch, HADOOP-3681-3-17.patch, HADOOP-3681-3-18.patch, HADOOP-3681-3-19.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3681) Infinite loop in dfs close

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lohit Vijayarenu updated HADOOP-3681:
-------------------------------------

    Attachment: HADOOP-3681-1.patch

Thanks koji. I was also able to reproduce this by throwing exception after locateFollowingBlock. Looks like this is what happened
- DFSClient timed out getting a new block from namenode, while namenode was busy. But in this case, namenode did allocate a block on behalf of the client.
- This raised an exception and locateFollowingBlock returned exception eventually closing streamer
- now closeInternal went pass isClosed() and was trying to complete the file. 
- namenode had a connection to client and so, did not expire the lease. 

Suggested fix is to call isClosed() while trying to complete the file. I tested this manually and it throws the exception stored in lastException and terminates the client. 

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3681) Infinite loop in dfs close

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lohit Vijayarenu updated HADOOP-3681:
-------------------------------------

    Status: Open  (was: Patch Available)

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>            Priority: Blocker
>             Fix For: 0.17.2, 0.18.0
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch, HADOOP-3681-3-17.patch, HADOOP-3681-3-18.patch, HADOOP-3681-3-19.patch, HADOOP-3681-3-19.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3681) Infinite loop in dfs close

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610316#action_12610316 ] 

Hadoop QA commented on HADOOP-3681:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12385226/HADOOP-3681-2.patch
  against trunk revision 673517.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2793/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2793/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2793/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2793/console

This message is automatically generated.

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>             Fix For: 0.17.1, 0.18.0
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3681) Infinite loop in dfs close

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609755#action_12609755 ] 

Koji Noguchi commented on HADOOP-3681:
--------------------------------------

fsck -files -locations -blocks /_temporary/_task_200806262325_4136_r_000408_0/part-00408  

showing 
{noformat}
/_temporary/_task_200806262325_4136_r_000408_0/part-00408
0 bytes, 1 block(s):  Replica placement policy is violated for blk_4878955496501003583. Block should be additionally
replicated on 2 more rack(s).
 MISSING 1 blocks of total size 0 B
0. blk_4878955496501003583 len=0 MISSING!

Status: CORRUPT
{noformat}

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3681) Infinite loop in dfs close

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-3681:
------------------------------------

    Fix Version/s: 0.18.0
                   0.17.1
         Assignee: Lohit Vijayarenu

Need patches for both 17 and 18, if different.

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>             Fix For: 0.17.1, 0.18.0
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3681) Infinite loop in dfs close

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lohit Vijayarenu updated HADOOP-3681:
-------------------------------------

    Attachment: HADOOP-3681-3-19.patch

Looks like the patch build picked up latest file even if it was not marked for inclusion. It tried to apply 0.17 patch on trunk. Reattaching 0.19 patch and retrying hudson. 

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>            Priority: Blocker
>             Fix For: 0.17.2, 0.18.0
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch, HADOOP-3681-3-17.patch, HADOOP-3681-3-18.patch, HADOOP-3681-3-19.patch, HADOOP-3681-3-19.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3681) Infinite loop in dfs close

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610110#action_12610110 ] 

Koji Noguchi commented on HADOOP-3681:
--------------------------------------

Trying to reproduce.

1) Intentionally fail DataStreamer by throwing IOException right AFTER
{noformat}
2219         lb = locateFollowingBlock(startTime);
{noformat}

2) Add Thread.sleep(1000) at the top of DataStreamer thread run() 
so that DataStreamer would fail after  flushInternal() line 
{noformat}
2524           isClosed();
{noformat}


This will reproduce the hang state.

Also, if datastreamer throws the IOException BEFORE that line, dfs -put would return '0' but ends up with empty file.

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>         Attachments: H-3681-jstack.txt
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3681) Infinite loop in dfs close

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lohit Vijayarenu updated HADOOP-3681:
-------------------------------------

    Status: Patch Available  (was: Open)

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3681) Infinite loop in dfs close

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lohit Vijayarenu updated HADOOP-3681:
-------------------------------------

    Attachment: HADOOP-3681-2.patch

Thanks Koji, Dhruba.
Another try with this new patch. Here, we catch Throwable and isComplete() is called only after we try to complete file for 10 times. I think we should allow complete file to retry few times before checking for isClosed(). 

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3681) Infinite loop in dfs close

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lohit Vijayarenu updated HADOOP-3681:
-------------------------------------

    Status: Patch Available  (was: Open)

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>             Fix For: 0.17.2, 0.18.0
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch, HADOOP-3681-3-17.patch, HADOOP-3681-3-18.patch, HADOOP-3681-3-19.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3681) Infinite loop in dfs close

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610243#action_12610243 ] 

dhruba borthakur commented on HADOOP-3681:
------------------------------------------

Hi Lohit, +1 on your patch.

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3681) Infinite loop in dfs close

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Noguchi updated HADOOP-3681:
---------------------------------

    Attachment: H-3681-jstack.txt

Attaching jstack of the dfsclient.

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>         Attachments: H-3681-jstack.txt
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3681) Infinite loop in dfs close

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609775#action_12609775 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-3681:
------------------------------------------------

Since the length of blk_4878955496501003583 is 0, could we let Fsck delete the block?  Then, namenode.complete(...) will return normally.

Although this does not fix the bug, it provides an option for system recovery.


> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>         Attachments: H-3681-jstack.txt
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3681) Infinite loop in dfs close

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lohit Vijayarenu updated HADOOP-3681:
-------------------------------------

    Attachment: HADOOP-3681-3-19.patch

Same patch for 0.19 version.

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>             Fix For: 0.17.1, 0.18.0
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch, HADOOP-3681-3-18.patch, HADOOP-3681-3-19.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3681) Infinite loop in dfs close

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610322#action_12610322 ] 

Koji Noguchi commented on HADOOP-3681:
--------------------------------------

Lohit, with your second patch,  if DataStreamer thread throws an exception before  locateFollowingBlock(startTime), hadoop dfs -put can incorrectly succeed ending up with empty dfs file. 

I don't know the detail of the dfs enough, but maybe we need to check for the error after queue is emptied?  
Can we call isClosed() at the bottom of flushInternal ?

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>             Fix For: 0.17.1, 0.18.0
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3681) Infinite loop in dfs close

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lohit Vijayarenu updated HADOOP-3681:
-------------------------------------

    Attachment: HADOOP-3681-3-17.patch

Same patch for 0.17

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>             Fix For: 0.17.1, 0.18.0
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch, HADOOP-3681-3-17.patch, HADOOP-3681-3-18.patch, HADOOP-3681-3-19.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3681) Infinite loop in dfs close

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611308#action_12611308 ] 

Lohit Vijayarenu commented on HADOOP-3681:
------------------------------------------

I ran tests locally against trunk and branch-0.18 on my local machine. Both runs passed all tests. 

> Infinite loop in dfs close
> --------------------------
>
>                 Key: HADOOP-3681
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3681
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Koji Noguchi
>            Assignee: Lohit Vijayarenu
>            Priority: Blocker
>             Fix For: 0.17.2, 0.18.0
>
>         Attachments: H-3681-jstack.txt, HADOOP-3681-1.patch, HADOOP-3681-2.patch, HADOOP-3681-3-17.patch, HADOOP-3681-3-18.patch, HADOOP-3681-3-19.patch
>
>
> We had dfsClient -put  hang outputting 
> {noformat}
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException:
> timed out waiting for rpc response
> 2008-06-28 10:05:12,595 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery for block null bad datanode[0]
> 2008-06-28 10:05:51,067 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:52,898 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:54,893 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:56,920 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:57,765 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> 2008-06-28 10:05:58,199 INFO org.apache.hadoop.dfs.DFSClient: Could not complete file
> /_temporary/_task_200806262325_4136_r_000408_0/part-00408
> retrying...
> [repeats forever]
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.