You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Koji Noguchi (JIRA)" <ji...@apache.org> on 2007/09/18 00:11:43 UTC

[jira] Created: (HADOOP-1911) infinite loop in dfs -cat command.

infinite loop in dfs -cat command.
----------------------------------

                 Key: HADOOP-1911
                 URL: https://issues.apache.org/jira/browse/HADOOP-1911
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.13.1
            Reporter: Koji Noguchi


[knoguchi]$ hadoop dfs -cat fileA
07/09/13 17:36:02 INFO fs.DFSClient: Could not obtain block 0 from any node: 
java.io.IOException: No live nodes contain current block
07/09/13 17:36:20 INFO fs.DFSClient: Could not obtain block 0 from any node: 
java.io.IOException: No live nodes contain current block
[repeats forever]

Setting one of the Debug statement to Warn, it kept on showing 
{noformat} 
 WARN org.apache.hadoop.fs.DFSClient: Failed to connect
to /99.99.999.9 :11111:java.io.IOException: Recorded block size is 7496, but
datanode reports size of 0
	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:690)
	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:771)
	at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:41)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
	at java.io.DataInputStream.readFully(DataInputStream.java:178)
	at java.io.DataInputStream.readFully(DataInputStream.java:152)
	at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.(ChecksumFileSystem.java:123)
	at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:340)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:259)
	at org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyFiles.java:466)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:186)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1707)
{noformat} 


Turns out fileA was corrupted. Fsck showed crc file of 7496 bytes, but when I searched for the blocks on each node, 3 replicas were all size 0.

Not sure how it got corrupted, but it would be nice if the dfs command fail instead of getting into an infinite loop.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1911) infinite loop in dfs -cat command.

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Koji Noguchi updated HADOOP-1911:
---------------------------------

        Fix Version/s: 0.16.0
    Affects Version/s: 0.14.3

This bug still happens after 0.14 upgrade.

If this file is part of distcp, the job won't finish.

> infinite loop in dfs -cat command.
> ----------------------------------
>
>                 Key: HADOOP-1911
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1911
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.1, 0.14.3
>            Reporter: Koji Noguchi
>             Fix For: 0.16.0
>
>
> [knoguchi]$ hadoop dfs -cat fileA
> 07/09/13 17:36:02 INFO fs.DFSClient: Could not obtain block 0 from any node: 
> java.io.IOException: No live nodes contain current block
> 07/09/13 17:36:20 INFO fs.DFSClient: Could not obtain block 0 from any node: 
> java.io.IOException: No live nodes contain current block
> [repeats forever]
> Setting one of the Debug statement to Warn, it kept on showing 
> {noformat} 
>  WARN org.apache.hadoop.fs.DFSClient: Failed to connect
> to /99.99.999.9 :11111:java.io.IOException: Recorded block size is 7496, but
> datanode reports size of 0
> 	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:690)
> 	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:771)
> 	at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:41)
> 	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> 	at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
> 	at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> 	at java.io.DataInputStream.readFully(DataInputStream.java:178)
> 	at java.io.DataInputStream.readFully(DataInputStream.java:152)
> 	at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.(ChecksumFileSystem.java:123)
> 	at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:340)
> 	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:259)
> 	at org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyFiles.java:466)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:186)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1707)
> {noformat} 
> Turns out fileA was corrupted. Fsck showed crc file of 7496 bytes, but when I searched for the blocks on each node, 3 replicas were all size 0.
> Not sure how it got corrupted, but it would be nice if the dfs command fail instead of getting into an infinite loop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1911) infinite loop in dfs -cat command.

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12528186 ] 

dhruba borthakur commented on HADOOP-1911:
------------------------------------------

chooseDataNode() has bug that is triggered when all the replicas of a file are bad. The value of "failures" in DFSClient.chooseDataNode is always zero. When there are no more good nodes, bestNode() generates an exception that is caught inside chooseDataNode. "failures" is still zero; it clears the deadnodes, refetches block locations and starts all over again. Hence the infinite loop.


> infinite loop in dfs -cat command.
> ----------------------------------
>
>                 Key: HADOOP-1911
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1911
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.1
>            Reporter: Koji Noguchi
>
> [knoguchi]$ hadoop dfs -cat fileA
> 07/09/13 17:36:02 INFO fs.DFSClient: Could not obtain block 0 from any node: 
> java.io.IOException: No live nodes contain current block
> 07/09/13 17:36:20 INFO fs.DFSClient: Could not obtain block 0 from any node: 
> java.io.IOException: No live nodes contain current block
> [repeats forever]
> Setting one of the Debug statement to Warn, it kept on showing 
> {noformat} 
>  WARN org.apache.hadoop.fs.DFSClient: Failed to connect
> to /99.99.999.9 :11111:java.io.IOException: Recorded block size is 7496, but
> datanode reports size of 0
> 	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:690)
> 	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:771)
> 	at org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:41)
> 	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> 	at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
> 	at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> 	at java.io.DataInputStream.readFully(DataInputStream.java:178)
> 	at java.io.DataInputStream.readFully(DataInputStream.java:152)
> 	at org.apache.hadoop.fs.ChecksumFileSystem$FSInputChecker.(ChecksumFileSystem.java:123)
> 	at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:340)
> 	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:259)
> 	at org.apache.hadoop.util.CopyFiles$FSCopyFilesMapper.map(CopyFiles.java:466)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:186)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1707)
> {noformat} 
> Turns out fileA was corrupted. Fsck showed crc file of 7496 bytes, but when I searched for the blocks on each node, 3 replicas were all size 0.
> Not sure how it got corrupted, but it would be nice if the dfs command fail instead of getting into an infinite loop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.