You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Suman Sehgal (JIRA)" <ji...@apache.org> on 2008/09/24 16:48:44 UTC

[jira] Created: (HADOOP-4264) DFSIO is failing on 500 nodes cluster

DFSIO is failing on 500 nodes cluster
-------------------------------------

                 Key: HADOOP-4264
                 URL: https://issues.apache.org/jira/browse/HADOOP-4264
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred, test
    Affects Versions: 0.19.0
            Reporter: Suman Sehgal


On executing following command : 
bin/hadoop jar ~/hadoop/hadoop-0.19.0-test.jar TestDFSIO -write -nrFiles 990 -fileSize 320     

This error occurs:
08/09/24 06:15:03 INFO mapred.JobClient:  map 98% reduce 32%
java.io.IOException: Job failed!
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1201)
	at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:236)
	at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:218)
	at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:354)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:77)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
	at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
	at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

On looking at hadoop logs, It seems that file names are clashing

2008-09-24 06:21:41,618 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200809240600_0005_m_000802_2_1222236048515' from 'tracker_xxxx/client x.x.x.x:xxxxx'
2008-09-24 06:21:41,627 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200809240600_0005_m_000802_4_1222236048515' to tip task_200809240600_0005_m_000802, for tracker 'tracker_xxxx/client x.x.x.x:xxxxx'
2008-09-24 06:21:41,627 INFO org.apache.hadoop.mapred.JobInProgress: Choosing rack-local task task_200809240600_0005_m_000802
2008-09-24 06:21:41,724 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_200809240600_0005_m_000900_2_1222236048515: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_20 for DFSClient_attempt_200809240600_0005_m_000900_2_1222236048515 on client client x.x.x.x, because this file is already being created by DFSClient_attempt_200809240600_0005_m_000900_0_1222236048515 on client x.x.x.x



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4264) DFSIO is failing on 500 nodes cluster

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634393#action_12634393 ] 

Amareshwari Sriramadasu commented on HADOOP-4264:
-------------------------------------------------

Suman, can you try running test again with speculative execution off by specifying configuration properties *mapred.map.tasks.speculative.execution* and *mapred.reduce.tasks.speculative.execution* as *false*.

> DFSIO is failing on 500 nodes cluster
> -------------------------------------
>
>                 Key: HADOOP-4264
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4264
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io, test
>    Affects Versions: 0.19.0
>            Reporter: Suman Sehgal
>
> On executing following command : 
> bin/hadoop jar ~/hadoop/hadoop-0.19.0-test.jar TestDFSIO -write -nrFiles 990 -fileSize 320     
> This error occurs:
> 08/09/24 06:15:03 INFO mapred.JobClient:  map 98% reduce 32%
> java.io.IOException: Job failed!
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1201)
> 	at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:236)
> 	at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:218)
> 	at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:354)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:77)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
> 	at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> 	at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> On looking at hadoop logs, It seems that file names are clashing
> 2008-09-24 06:21:41,618 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200809240600_0005_m_000802_2_1222236048515' from 'tracker_xxxx/client x.x.x.x:xxxxx'
> 2008-09-24 06:21:41,627 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200809240600_0005_m_000802_4_1222236048515' to tip task_200809240600_0005_m_000802, for tracker 'tracker_xxxx/client x.x.x.x:xxxxx'
> 2008-09-24 06:21:41,627 INFO org.apache.hadoop.mapred.JobInProgress: Choosing rack-local task task_200809240600_0005_m_000802
> 2008-09-24 06:21:41,724 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_200809240600_0005_m_000900_2_1222236048515: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_20 for DFSClient_attempt_200809240600_0005_m_000900_2_1222236048515 on client client x.x.x.x, because this file is already being created by DFSClient_attempt_200809240600_0005_m_000900_0_1222236048515 on client x.x.x.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4264) DFSIO is failing on 500 nodes cluster

Posted by "Suman Sehgal (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suman Sehgal updated HADOOP-4264:
---------------------------------

    Component/s:     (was: mapred)
                 io

> DFSIO is failing on 500 nodes cluster
> -------------------------------------
>
>                 Key: HADOOP-4264
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4264
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io, test
>    Affects Versions: 0.19.0
>            Reporter: Suman Sehgal
>
> On executing following command : 
> bin/hadoop jar ~/hadoop/hadoop-0.19.0-test.jar TestDFSIO -write -nrFiles 990 -fileSize 320     
> This error occurs:
> 08/09/24 06:15:03 INFO mapred.JobClient:  map 98% reduce 32%
> java.io.IOException: Job failed!
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1201)
> 	at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:236)
> 	at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:218)
> 	at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:354)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:77)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
> 	at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> 	at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> On looking at hadoop logs, It seems that file names are clashing
> 2008-09-24 06:21:41,618 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200809240600_0005_m_000802_2_1222236048515' from 'tracker_xxxx/client x.x.x.x:xxxxx'
> 2008-09-24 06:21:41,627 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200809240600_0005_m_000802_4_1222236048515' to tip task_200809240600_0005_m_000802, for tracker 'tracker_xxxx/client x.x.x.x:xxxxx'
> 2008-09-24 06:21:41,627 INFO org.apache.hadoop.mapred.JobInProgress: Choosing rack-local task task_200809240600_0005_m_000802
> 2008-09-24 06:21:41,724 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_200809240600_0005_m_000900_2_1222236048515: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_20 for DFSClient_attempt_200809240600_0005_m_000900_2_1222236048515 on client client x.x.x.x, because this file is already being created by DFSClient_attempt_200809240600_0005_m_000900_0_1222236048515 on client x.x.x.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4264) DFSIO is failing on 500 nodes cluster

Posted by "Suman Sehgal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635132#action_12635132 ] 

Suman Sehgal commented on HADOOP-4264:
--------------------------------------

Tried this test with above mentioned configurations i.e. mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution as false but it is still failing.

> DFSIO is failing on 500 nodes cluster
> -------------------------------------
>
>                 Key: HADOOP-4264
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4264
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io, test
>    Affects Versions: 0.19.0
>            Reporter: Suman Sehgal
>
> On executing following command : 
> bin/hadoop jar ~/hadoop/hadoop-0.19.0-test.jar TestDFSIO -write -nrFiles 990 -fileSize 320     
> This error occurs:
> 08/09/24 06:15:03 INFO mapred.JobClient:  map 98% reduce 32%
> java.io.IOException: Job failed!
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1201)
> 	at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:236)
> 	at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:218)
> 	at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:354)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:77)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
> 	at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> 	at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> On looking at hadoop logs, It seems that file names are clashing
> 2008-09-24 06:21:41,618 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200809240600_0005_m_000802_2_1222236048515' from 'tracker_xxxx/client x.x.x.x:xxxxx'
> 2008-09-24 06:21:41,627 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200809240600_0005_m_000802_4_1222236048515' to tip task_200809240600_0005_m_000802, for tracker 'tracker_xxxx/client x.x.x.x:xxxxx'
> 2008-09-24 06:21:41,627 INFO org.apache.hadoop.mapred.JobInProgress: Choosing rack-local task task_200809240600_0005_m_000802
> 2008-09-24 06:21:41,724 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_200809240600_0005_m_000900_2_1222236048515: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_20 for DFSClient_attempt_200809240600_0005_m_000900_2_1222236048515 on client client x.x.x.x, because this file is already being created by DFSClient_attempt_200809240600_0005_m_000900_0_1222236048515 on client x.x.x.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4264) DFSIO is failing on 500 nodes cluster

Posted by "Suman Sehgal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636100#action_12636100 ] 

Suman Sehgal commented on HADOOP-4264:
--------------------------------------

After discussing the issue with mapred team it was found that AlreadyBeingCreatedException was caused due to the following issue:

2008-09-26 19:51:46,821 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: 69000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/xx.xxx.xx.xxx:xxxxx remote=/xx.xxx.xx.xxx:xxxxx]
2008-09-26 19:51:46,821 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-3654405428661288966_4855
2008-09-26 19:51:46,823 INFO org.apache.hadoop.hdfs.DFSClient: Waiting to find target node: xx.xxx.xx.xxx:xxxxx
2008-09-26 19:52:11,790 WARN org.apache.hadoop.mapred.TaskRunner: Parent died.  Exiting attempt_200809261929_0005_m_000001_0_1222457384229
2008-09-26 19:53:16,581 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block blk_-3774088676509031650_4855java.net.SocketTimeoutException: 69000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/xx.xxx.xx.xxx:xxxxx remote=/xx.xxx.xx.xxx:xxxxx]

> DFSIO is failing on 500 nodes cluster
> -------------------------------------
>
>                 Key: HADOOP-4264
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4264
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io, test
>    Affects Versions: 0.19.0
>            Reporter: Suman Sehgal
>
> On executing following command : 
> bin/hadoop jar ~/hadoop/hadoop-0.19.0-test.jar TestDFSIO -write -nrFiles 990 -fileSize 320     
> This error occurs:
> 08/09/24 06:15:03 INFO mapred.JobClient:  map 98% reduce 32%
> java.io.IOException: Job failed!
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1201)
> 	at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:236)
> 	at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:218)
> 	at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:354)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:77)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
> 	at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> 	at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> On looking at hadoop logs, It seems that file names are clashing
> 2008-09-24 06:21:41,618 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200809240600_0005_m_000802_2_1222236048515' from 'tracker_xxxx/client x.x.x.x:xxxxx'
> 2008-09-24 06:21:41,627 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200809240600_0005_m_000802_4_1222236048515' to tip task_200809240600_0005_m_000802, for tracker 'tracker_xxxx/client x.x.x.x:xxxxx'
> 2008-09-24 06:21:41,627 INFO org.apache.hadoop.mapred.JobInProgress: Choosing rack-local task task_200809240600_0005_m_000802
> 2008-09-24 06:21:41,724 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_200809240600_0005_m_000900_2_1222236048515: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_20 for DFSClient_attempt_200809240600_0005_m_000900_2_1222236048515 on client client x.x.x.x, because this file is already being created by DFSClient_attempt_200809240600_0005_m_000900_0_1222236048515 on client x.x.x.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4264) DFSIO is failing on 500 nodes cluster

Posted by "Suman Sehgal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636099#action_12636099 ] 

Suman Sehgal commented on HADOOP-4264:
--------------------------------------

Its giving the same AlreadyBeingCreatedException.
2008-09-26 19:52:18,590 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_200809261929_0005_m_000001_0_1222457384229: Task attempt_200809261929_0005_m_000001_0_1222457384229 failed to report status for 603 seconds. Killing!
2008-09-26 19:52:18,591 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200809261929_0005_m_000001_0_1222457384229' from 'tracker_host:host/xx.xxx.xx.xxx:xxxxx'

2008-09-26 19:57:24,546 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_200809261929_0005_m_000001_1_1222457384229: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_101 for DFSClient_attempt_200809261929_0005_m_000001_1_1222457384229 on client xx.xxx.xx.xxx, because this file is already being created by DFSClient_attempt_200809261929_0005_m_000001_0_1222457384229 on xx.xxx.xx.xxx

        



> DFSIO is failing on 500 nodes cluster
> -------------------------------------
>
>                 Key: HADOOP-4264
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4264
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io, test
>    Affects Versions: 0.19.0
>            Reporter: Suman Sehgal
>
> On executing following command : 
> bin/hadoop jar ~/hadoop/hadoop-0.19.0-test.jar TestDFSIO -write -nrFiles 990 -fileSize 320     
> This error occurs:
> 08/09/24 06:15:03 INFO mapred.JobClient:  map 98% reduce 32%
> java.io.IOException: Job failed!
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1201)
> 	at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:236)
> 	at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:218)
> 	at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:354)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:77)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
> 	at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> 	at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> On looking at hadoop logs, It seems that file names are clashing
> 2008-09-24 06:21:41,618 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200809240600_0005_m_000802_2_1222236048515' from 'tracker_xxxx/client x.x.x.x:xxxxx'
> 2008-09-24 06:21:41,627 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200809240600_0005_m_000802_4_1222236048515' to tip task_200809240600_0005_m_000802, for tracker 'tracker_xxxx/client x.x.x.x:xxxxx'
> 2008-09-24 06:21:41,627 INFO org.apache.hadoop.mapred.JobInProgress: Choosing rack-local task task_200809240600_0005_m_000802
> 2008-09-24 06:21:41,724 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_200809240600_0005_m_000900_2_1222236048515: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_20 for DFSClient_attempt_200809240600_0005_m_000900_2_1222236048515 on client client x.x.x.x, because this file is already being created by DFSClient_attempt_200809240600_0005_m_000900_0_1222236048515 on client x.x.x.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4264) DFSIO is failing on 500 nodes cluster

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634392#action_12634392 ] 

Amareshwari Sriramadasu commented on HADOOP-4264:
-------------------------------------------------

TestDFSIO is creating files in DATA_DIR, which is not output directory for the mapreduce job. If speculative execution is on for the test, two attempts of the task are trying to create same file. This test should have speculative execution off. 

> DFSIO is failing on 500 nodes cluster
> -------------------------------------
>
>                 Key: HADOOP-4264
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4264
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io, test
>    Affects Versions: 0.19.0
>            Reporter: Suman Sehgal
>
> On executing following command : 
> bin/hadoop jar ~/hadoop/hadoop-0.19.0-test.jar TestDFSIO -write -nrFiles 990 -fileSize 320     
> This error occurs:
> 08/09/24 06:15:03 INFO mapred.JobClient:  map 98% reduce 32%
> java.io.IOException: Job failed!
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1201)
> 	at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:236)
> 	at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:218)
> 	at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:354)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:77)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
> 	at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> 	at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> On looking at hadoop logs, It seems that file names are clashing
> 2008-09-24 06:21:41,618 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200809240600_0005_m_000802_2_1222236048515' from 'tracker_xxxx/client x.x.x.x:xxxxx'
> 2008-09-24 06:21:41,627 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200809240600_0005_m_000802_4_1222236048515' to tip task_200809240600_0005_m_000802, for tracker 'tracker_xxxx/client x.x.x.x:xxxxx'
> 2008-09-24 06:21:41,627 INFO org.apache.hadoop.mapred.JobInProgress: Choosing rack-local task task_200809240600_0005_m_000802
> 2008-09-24 06:21:41,724 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_200809240600_0005_m_000900_2_1222236048515: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_20 for DFSClient_attempt_200809240600_0005_m_000900_2_1222236048515 on client client x.x.x.x, because this file is already being created by DFSClient_attempt_200809240600_0005_m_000900_0_1222236048515 on client x.x.x.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4264) DFSIO is failing on 500 nodes cluster

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636105#action_12636105 ] 

Devaraj Das commented on HADOOP-4264:
-------------------------------------

Could it be because of lack of progress reporting while the task is waiting for the datanode?

> DFSIO is failing on 500 nodes cluster
> -------------------------------------
>
>                 Key: HADOOP-4264
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4264
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io, test
>    Affects Versions: 0.19.0
>            Reporter: Suman Sehgal
>
> On executing following command : 
> bin/hadoop jar ~/hadoop/hadoop-0.19.0-test.jar TestDFSIO -write -nrFiles 990 -fileSize 320     
> This error occurs:
> 08/09/24 06:15:03 INFO mapred.JobClient:  map 98% reduce 32%
> java.io.IOException: Job failed!
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1201)
> 	at org.apache.hadoop.fs.TestDFSIO.runIOTest(TestDFSIO.java:236)
> 	at org.apache.hadoop.fs.TestDFSIO.writeTest(TestDFSIO.java:218)
> 	at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:354)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> 	at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:77)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
> 	at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> 	at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
> On looking at hadoop logs, It seems that file names are clashing
> 2008-09-24 06:21:41,618 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_200809240600_0005_m_000802_2_1222236048515' from 'tracker_xxxx/client x.x.x.x:xxxxx'
> 2008-09-24 06:21:41,627 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_200809240600_0005_m_000802_4_1222236048515' to tip task_200809240600_0005_m_000802, for tracker 'tracker_xxxx/client x.x.x.x:xxxxx'
> 2008-09-24 06:21:41,627 INFO org.apache.hadoop.mapred.JobInProgress: Choosing rack-local task task_200809240600_0005_m_000802
> 2008-09-24 06:21:41,724 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_200809240600_0005_m_000900_2_1222236048515: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /benchmarks/TestDFSIO/io_data/test_io_20 for DFSClient_attempt_200809240600_0005_m_000900_2_1222236048515 on client client x.x.x.x, because this file is already being created by DFSClient_attempt_200809240600_0005_m_000900_0_1222236048515 on client x.x.x.x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.