You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jason Venner <ja...@attributor.com> on 2007/12/04 19:06:43 UTC

Has anyone had hdfs block move synchronization failures with hadoop 0.15.0?

We have a small cluster of 9 machines on a shared Gig Switch (with a lot 
of other machines)

The other day, running a job, the reduce stalled, when the map was 
99.99x% done.
7 of the 9 machines were idle, and 2 of the machines were using 100% of 
1 cpu (1 job per machine).

So it appears that there was a synchronization failure, in that one 
machine thought the transfer hadn't started and the other machine 
thought it had.

We did have a momentary network outage on the switch during this job. We 
tried stopping the hadoop processes on the machines with the sending 
failures, and after 10 minutes they went 'dead' but the job never resumed.

Looking into the log files of the spinning machines, they were endlessly 
trying to start a block move to any of a set of other machines in the 
cluster. The shape of their log message repeats are below.

007-12-03 15:42:44,755 INFO org.apache.hadoop.dfs.DataNode: Starting 
thread to transfer block blk_3105072074036734167 to 
[Lorg.apache.hadoop.dfs.DatanodeInfo;@6fc40f
2007-12-03 15:42:44,757 WARN org.apache.hadoop.dfs.DataNode: Failed to 
transfer blk_3105072074036734167 to XX.YY.ZZ.AAA:50010 got 
java.net.SocketException: Broken pipe
       at java.net.SocketOutputStream.socketWrite0(Native Method)
       at 
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
       at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
       at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
       at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
       at java.io.DataOutputStream.write(DataOutputStream.java:90)
       at 
org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1175)
       at 
org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1208)
       at 
org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:1460)
       at java.lang.Thread.run(Thread.java:619)


-- On the machines that the transfers were targeted to, the following 
was in the log file.

2007-12-03 15:42:18,508 ERROR org.apache.hadoop.dfs.DataNode: 
DataXceiver: java.io.IOException: Block blk_3105072074036734167 has 
already been started (though not completed), and thus cannot be created.
       at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:568)
       at 
org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1257)
       at 
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:901)
       at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:804)
       at java.lang.Thread.run(Thread.java:619)


Re: Has anyone had hdfs block move synchronization failures with hadoop 0.15.0?

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Thanks. I thought this might have been caused by the network outage you 
mentioned. If this is repeatable, please file a jira and any details on 
how to reproduce.

Raghu.

Jason Venner wrote:
> This failure seems to be repeatable with this job and this cluster.
> I reran, and had the same problem, 2 machines unable to transfer some 
> blocks.
> 
> I have a mapper, a combiner and a reducer. My combiner results in about 
> a 4 to 1 reduction in data volumes.
> 
> This is the same job that shows up with the slow reducer transfer rates 
> I have asked about earlier.
> 
> reduce > copy (643 of 789 at 0.12 MB/s) >
> reduce > copy (656 of 789 at 0.12 MB/s) >
> reduce > copy (644 of 789 at 0.12 MB/s) >
> reduce > copy (644 of 789 at 0.12 MB/s) >
> reduce > copy (656 of 789 at 0.12 MB/s) >
> reduce > copy (656 of 789 at 0.12 MB/s) >
> reduce > copy (643 of 789 at 0.12 MB/s) >
> reduce > copy (623 of 789 at 0.12 MB/s) >
> reduce > copy (621 of 789 at 0.12 MB/s) >
> 
> Raghu Angadi wrote:
>>
>> I would think after an hours or so things are ok.. but that might not 
>> have helped the job.
>>
>> Raghu.
>>
>> Jason Venner wrote:
>>> We have a small cluster of 9 machines on a shared Gig Switch (with a 
>>> lot of other machines)
>>>
>>> The other day, running a job, the reduce stalled, when the map was 
>>> 99.99x% done.
>>> 7 of the 9 machines were idle, and 2 of the machines were using 100% 
>>> of 1 cpu (1 job per machine).
>>>
>>> So it appears that there was a synchronization failure, in that one 
>>> machine thought the transfer hadn't started and the other machine 
>>> thought it had.
>>>
>>> We did have a momentary network outage on the switch during this job. 
>>> We tried stopping the hadoop processes on the machines with the 
>>> sending failures, and after 10 minutes they went 'dead' but the job 
>>> never resumed.
>>>
>>> Looking into the log files of the spinning machines, they were 
>>> endlessly trying to start a block move to any of a set of other 
>>> machines in the cluster. The shape of their log message repeats are 
>>> below.
>>>
>>> 007-12-03 15:42:44,755 INFO org.apache.hadoop.dfs.DataNode: Starting 
>>> thread to transfer block blk_3105072074036734167 to 
>>> [Lorg.apache.hadoop.dfs.DatanodeInfo;@6fc40f
>>> 2007-12-03 15:42:44,757 WARN org.apache.hadoop.dfs.DataNode: Failed 
>>> to transfer blk_3105072074036734167 to XX.YY.ZZ.AAA:50010 got 
>>> java.net.SocketException: Broken pipe
>>>       at java.net.SocketOutputStream.socketWrite0(Native Method)
>>>       at 
>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>>>       at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>>>       at 
>>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>>>       at 
>>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>>>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>>       at 
>>> org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1175)
>>>       at 
>>> org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1208)
>>>       at 
>>> org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:1460)
>>>       at java.lang.Thread.run(Thread.java:619)
>>>
>>>
>>> -- On the machines that the transfers were targeted to, the following 
>>> was in the log file.
>>>
>>> 2007-12-03 15:42:18,508 ERROR org.apache.hadoop.dfs.DataNode: 
>>> DataXceiver: java.io.IOException: Block blk_3105072074036734167 has 
>>> already been started (though not completed), and thus cannot be created.
>>>       at 
>>> org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:568)
>>>       at 
>>> org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1257)
>>>       at 
>>> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:901)
>>>       at 
>>> org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:804)
>>>       at java.lang.Thread.run(Thread.java:619)
>>>
>>


Re: Has anyone had hdfs block move synchronization failures with hadoop 0.15.0?

Posted by Jason Venner <ja...@attributor.com>.
This failure seems to be repeatable with this job and this cluster.
I reran, and had the same problem, 2 machines unable to transfer some 
blocks.

I have a mapper, a combiner and a reducer. My combiner results in about 
a 4 to 1 reduction in data volumes.

This is the same job that shows up with the slow reducer transfer rates 
I have asked about earlier.

reduce > copy (643 of 789 at 0.12 MB/s) >
reduce > copy (656 of 789 at 0.12 MB/s) >
reduce > copy (644 of 789 at 0.12 MB/s) >
reduce > copy (644 of 789 at 0.12 MB/s) >
reduce > copy (656 of 789 at 0.12 MB/s) >
reduce > copy (656 of 789 at 0.12 MB/s) >
reduce > copy (643 of 789 at 0.12 MB/s) >
reduce > copy (623 of 789 at 0.12 MB/s) >
reduce > copy (621 of 789 at 0.12 MB/s) >

Raghu Angadi wrote:
>
> I would think after an hours or so things are ok.. but that might not 
> have helped the job.
>
> Raghu.
>
> Jason Venner wrote:
>> We have a small cluster of 9 machines on a shared Gig Switch (with a 
>> lot of other machines)
>>
>> The other day, running a job, the reduce stalled, when the map was 
>> 99.99x% done.
>> 7 of the 9 machines were idle, and 2 of the machines were using 100% 
>> of 1 cpu (1 job per machine).
>>
>> So it appears that there was a synchronization failure, in that one 
>> machine thought the transfer hadn't started and the other machine 
>> thought it had.
>>
>> We did have a momentary network outage on the switch during this job. 
>> We tried stopping the hadoop processes on the machines with the 
>> sending failures, and after 10 minutes they went 'dead' but the job 
>> never resumed.
>>
>> Looking into the log files of the spinning machines, they were 
>> endlessly trying to start a block move to any of a set of other 
>> machines in the cluster. The shape of their log message repeats are 
>> below.
>>
>> 007-12-03 15:42:44,755 INFO org.apache.hadoop.dfs.DataNode: Starting 
>> thread to transfer block blk_3105072074036734167 to 
>> [Lorg.apache.hadoop.dfs.DatanodeInfo;@6fc40f
>> 2007-12-03 15:42:44,757 WARN org.apache.hadoop.dfs.DataNode: Failed 
>> to transfer blk_3105072074036734167 to XX.YY.ZZ.AAA:50010 got 
>> java.net.SocketException: Broken pipe
>>       at java.net.SocketOutputStream.socketWrite0(Native Method)
>>       at 
>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>>       at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>>       at 
>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>>       at 
>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>       at 
>> org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1175)
>>       at 
>> org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1208)
>>       at 
>> org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:1460)
>>       at java.lang.Thread.run(Thread.java:619)
>>
>>
>> -- On the machines that the transfers were targeted to, the following 
>> was in the log file.
>>
>> 2007-12-03 15:42:18,508 ERROR org.apache.hadoop.dfs.DataNode: 
>> DataXceiver: java.io.IOException: Block blk_3105072074036734167 has 
>> already been started (though not completed), and thus cannot be created.
>>       at 
>> org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:568)
>>       at 
>> org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1257)
>>       at 
>> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:901)
>>       at 
>> org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:804)
>>       at java.lang.Thread.run(Thread.java:619)
>>
>

Re: Has anyone had hdfs block move synchronization failures with hadoop 0.15.0?

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
I would think after an hours or so things are ok.. but that might not 
have helped the job.

Raghu.

Jason Venner wrote:
> We have a small cluster of 9 machines on a shared Gig Switch (with a lot 
> of other machines)
> 
> The other day, running a job, the reduce stalled, when the map was 
> 99.99x% done.
> 7 of the 9 machines were idle, and 2 of the machines were using 100% of 
> 1 cpu (1 job per machine).
> 
> So it appears that there was a synchronization failure, in that one 
> machine thought the transfer hadn't started and the other machine 
> thought it had.
> 
> We did have a momentary network outage on the switch during this job. We 
> tried stopping the hadoop processes on the machines with the sending 
> failures, and after 10 minutes they went 'dead' but the job never resumed.
> 
> Looking into the log files of the spinning machines, they were endlessly 
> trying to start a block move to any of a set of other machines in the 
> cluster. The shape of their log message repeats are below.
> 
> 007-12-03 15:42:44,755 INFO org.apache.hadoop.dfs.DataNode: Starting 
> thread to transfer block blk_3105072074036734167 to 
> [Lorg.apache.hadoop.dfs.DatanodeInfo;@6fc40f
> 2007-12-03 15:42:44,757 WARN org.apache.hadoop.dfs.DataNode: Failed to 
> transfer blk_3105072074036734167 to XX.YY.ZZ.AAA:50010 got 
> java.net.SocketException: Broken pipe
>       at java.net.SocketOutputStream.socketWrite0(Native Method)
>       at 
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>       at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>       at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>       at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>       at 
> org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1175)
>       at 
> org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1208)
>       at 
> org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:1460)
>       at java.lang.Thread.run(Thread.java:619)
> 
> 
> -- On the machines that the transfers were targeted to, the following 
> was in the log file.
> 
> 2007-12-03 15:42:18,508 ERROR org.apache.hadoop.dfs.DataNode: 
> DataXceiver: java.io.IOException: Block blk_3105072074036734167 has 
> already been started (though not completed), and thus cannot be created.
>       at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:568)
>       at 
> org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1257)
>       at 
> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:901)
>       at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:804)
>       at java.lang.Thread.run(Thread.java:619)
> 


Re: Has anyone had hdfs block move synchronization failures with hadoop 0.15.0?

Posted by Jason Venner <ja...@attributor.com>.
The following blocks have timeout errors from that job, in the logfile,
There are no timed out block messages for that block #
There are some socket timeouts.

img47: 
/data1/image_hadoop/hadoop-0.15.0/logs/hadoop-argus-datanode-img47.log.2007-12-03:2007-12-03 
13:41:08,520 WARN org.apache.hadoop.dfs.DataNode: Failed to transfer 
blk_3105072074036734167 to 10.50.30.99:50010 got 
java.net.SocketTimeoutException: connect timed out
img47: 
/data1/image_hadoop/hadoop-0.15.0/logs/hadoop-argus-datanode-img47.log.2007-12-03:2007-12-03 
13:57:55,378 WARN org.apache.hadoop.dfs.DataNode: Failed to transfer 
blk_3105072074036734167 to 10.50.30.99:50010 got 
java.net.SocketTimeoutException: connect timed out
img47: 
/data1/image_hadoop/hadoop-0.15.0/logs/hadoop-argus-datanode-img47.log.2007-12-03:2007-12-03 
14:10:05,737 WARN org.apache.hadoop.dfs.DataNode: Failed to transfer 
blk_3105072074036734167 to 10.50.30.99:50010 got 
java.net.SocketTimeoutException: connect timed out
img47: 
/data1/image_hadoop/hadoop-0.15.0/logs/hadoop-argus-datanode-img47.log.2007-12-03:2007-12-03 
14:22:37,109 WARN org.apache.hadoop.dfs.DataNode: Failed to transfer 
blk_3105072074036734167 to 10.50.30.99:50010 got 
java.net.SocketTimeoutException: connect timed out
img49: 
/data1/image_hadoop/hadoop-0.15.0/logs/hadoop-argus-datanode-img49.log.2007-12-03:2007-12-03 
15:44:04,088 ERROR org.apache.hadoop.dfs.DataNode: DataXceiver: 
java.io.img58: 
/data1/image_hadoop/hadoop-0.15.0/logs/hadoop-argus-datanode-img58.log.2007-12-03:2007-12-03 
13:29:04,158 INFO org.apache.hadoop.dfs.DataNode: Received block 
blk_3105072074036734167 from /10.50.30.100 and Read timed out
img52: 
/data1/image_hadoop/hadoop-0.15.0/logs/hadoop-argus-datanode-img52.log.2007-12-03:2007-12-03 
13:30:25,935 WARN org.apache.hadoop.dfs.DataNode: Failed to transfer 
blk_3105072074036734167 to 10.50.30.101:50010 got 
java.net.SocketTimeoutException: connect timed out
img53: 
/data1/image_hadoop/hadoop-0.15.0/logs/hadoop-argus-datanode-img53.log.2007-12-03:2007-12-03 
13:35:05,987 WARN org.apache.hadoop.dfs.DataNode: Failed to transfer 
blk_3105072074036734167 to 10.50.30.99:50010 got 
java.net.SocketTimeoutException: connect timed out
img53: 
/data1/image_hadoop/hadoop-0.15.0/logs/hadoop-argus-datanode-img53.log.2007-12-03:2007-12-03 
14:14:37,950 WARN org.apache.hadoop.dfs.DataNode: Failed to transfer 
blk_3105072074036734167 to 10.50.30.99:50010 got 
java.net.SocketTimeoutException: connect timed out
img562007-12-03:2007-12-03 13:34:52,788 WARN 
org.apache.hadoop.dfs.DataNode: Failed to transfer 
blk_3105072074036734167 to 10.50.30.99:50010 got 
java.net.SocketTimeoutException: connect timed out
img56: 
/data1/image_hadoop/hadoop-0.15.0/logs/hadoop-argus-datanode-img56.log.2007-12-03:2007-12-03 
13:39:25,020 WARN org.apache.hadoop.dfs.DataNode: Failed to transfer 
blk_3105072074036734167 to 10.50.30.101:50010 got 
java.net.SocketTimeoutException: connect timed out
img56: 
/data1/image_hadoop/hadoop-0.15.0/logs/hadoop-argus-datanode-img56.log.2007-12-03:2007-12-03 
14:10:19,417 WARN org.apache.hadoop.dfs.DataNode: Failed to transfer 
blk_3105072074036734167 to 10.50.30.99:50010 got 
java.net.SocketTimeoutException: connect timed out


Hairong Kuang wrote:
> Hi Jason,
>
> Could you please check the namenode log if you see any message starting
> with "PendingReplicationMonitor timed out block"?
>
> Hairong
>
> -----Original Message-----
> From: Jason Venner [mailto:jason@attributor.com] 
> Sent: Tuesday, December 04, 2007 10:07 AM
> To: hadoop-user@lucene.apache.org
> Subject: Has anyone had hdfs block move synchronization failures with
> hadoop 0.15.0?
>
> We have a small cluster of 9 machines on a shared Gig Switch (with a lot
> of other machines)
>
> The other day, running a job, the reduce stalled, when the map was
> 99.99x% done.
> 7 of the 9 machines were idle, and 2 of the machines were using 100% of
> 1 cpu (1 job per machine).
>
> So it appears that there was a synchronization failure, in that one
> machine thought the transfer hadn't started and the other machine
> thought it had.
>
> We did have a momentary network outage on the switch during this job. We
> tried stopping the hadoop processes on the machines with the sending
> failures, and after 10 minutes they went 'dead' but the job never
> resumed.
>
> Looking into the log files of the spinning machines, they were endlessly
> trying to start a block move to any of a set of other machines in the
> cluster. The shape of their log message repeats are below.
>
> 007-12-03 15:42:44,755 INFO org.apache.hadoop.dfs.DataNode: Starting
> thread to transfer block blk_3105072074036734167 to
> [Lorg.apache.hadoop.dfs.DatanodeInfo;@6fc40f
> 2007-12-03 15:42:44,757 WARN org.apache.hadoop.dfs.DataNode: Failed to
> transfer blk_3105072074036734167 to XX.YY.ZZ.AAA:50010 got
> java.net.SocketException: Broken pipe
>        at java.net.SocketOutputStream.socketWrite0(Native Method)
>        at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>        at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>        at
> java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>        at java.io.DataOutputStream.write(DataOutputStream.java:90)
>        at
> org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1175)
>        at
> org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1208)
>        at
> org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:1460)
>        at java.lang.Thread.run(Thread.java:619)
>
>
> -- On the machines that the transfers were targeted to, the following
> was in the log file.
>
> 2007-12-03 15:42:18,508 ERROR org.apache.hadoop.dfs.DataNode: 
> DataXceiver: java.io.IOException: Block blk_3105072074036734167 has
> already been started (though not completed), and thus cannot be created.
>        at
> org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:568)
>        at
> org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1257)
>        at
> org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:901)
>        at
> org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:804)
>        at java.lang.Thread.run(Thread.java:619)
>
>   

RE: Has anyone had hdfs block move synchronization failures with hadoop 0.15.0?

Posted by Hairong Kuang <ha...@yahoo-inc.com>.
Hi Jason,

Could you please check the namenode log if you see any message starting
with "PendingReplicationMonitor timed out block"?

Hairong

-----Original Message-----
From: Jason Venner [mailto:jason@attributor.com] 
Sent: Tuesday, December 04, 2007 10:07 AM
To: hadoop-user@lucene.apache.org
Subject: Has anyone had hdfs block move synchronization failures with
hadoop 0.15.0?

We have a small cluster of 9 machines on a shared Gig Switch (with a lot
of other machines)

The other day, running a job, the reduce stalled, when the map was
99.99x% done.
7 of the 9 machines were idle, and 2 of the machines were using 100% of
1 cpu (1 job per machine).

So it appears that there was a synchronization failure, in that one
machine thought the transfer hadn't started and the other machine
thought it had.

We did have a momentary network outage on the switch during this job. We
tried stopping the hadoop processes on the machines with the sending
failures, and after 10 minutes they went 'dead' but the job never
resumed.

Looking into the log files of the spinning machines, they were endlessly
trying to start a block move to any of a set of other machines in the
cluster. The shape of their log message repeats are below.

007-12-03 15:42:44,755 INFO org.apache.hadoop.dfs.DataNode: Starting
thread to transfer block blk_3105072074036734167 to
[Lorg.apache.hadoop.dfs.DatanodeInfo;@6fc40f
2007-12-03 15:42:44,757 WARN org.apache.hadoop.dfs.DataNode: Failed to
transfer blk_3105072074036734167 to XX.YY.ZZ.AAA:50010 got
java.net.SocketException: Broken pipe
       at java.net.SocketOutputStream.socketWrite0(Native Method)
       at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
       at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
       at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
       at
java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
       at java.io.DataOutputStream.write(DataOutputStream.java:90)
       at
org.apache.hadoop.dfs.DataNode$BlockSender.sendChunk(DataNode.java:1175)
       at
org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1208)
       at
org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:1460)
       at java.lang.Thread.run(Thread.java:619)


-- On the machines that the transfers were targeted to, the following
was in the log file.

2007-12-03 15:42:18,508 ERROR org.apache.hadoop.dfs.DataNode: 
DataXceiver: java.io.IOException: Block blk_3105072074036734167 has
already been started (though not completed), and thus cannot be created.
       at
org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:568)
       at
org.apache.hadoop.dfs.DataNode$BlockReceiver.<init>(DataNode.java:1257)
       at
org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:901)
       at
org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:804)
       at java.lang.Thread.run(Thread.java:619)