You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Jason Yang <li...@gmail.com> on 2012/09/24 15:25:46 UTC

Not able to place enough replicas in Reduce

Hi, all

I have been stuck by a weird problem for a long time. so I was wondering
could anyone give me some advise?

I have a MapReduce Job , in which:

1. the mapper would read a whole file as a split (by overriding
FileInputFormat.isSplitable ) and translate it into a set of points, and
then cluster these points into 4 groups randomly. The output of mapper is
<splitID_GroupID, point>. so If I have 20 input files, then there will be
20*4=80 unique keys in total.

2. I have implemented a KeyasNameMSOutputFormat which return file name
according to the key:

> public class KeyasNameMSFOutputFormat
> extends MultipleSequenceFileOutputFormat<Text, point>
> {
> protected String generateFileNameForKeyValue(Text key,
>  Point value, String name)
> {
> return key.toString();
>  }
> }


 3. Use the IdentityReducer. and the jobConf is like:

> // set path for input and output
> ...
>
> // set format for input and output
> jobConf.setInputFormat(WholeFileInputFormat.class);
> jobConf.setOutputFormat(KeyasNameMSFOutputFormat.class);
>
> // set class of output key and value
> jobConf.setOutputKeyClass(Text.class);
> jobConf.setOutputValueClass(Point.class);
>
> // set mapper and reducer
> jobConf.setMapperClass(LocalClusteringMapper.class);
> jobConf.setReducerClass(IdentityReducer.class);
>
> // run the job
> JobClient.runJob(jobConf);


On the pseudo-distributed hadoop cluster, this job runs fine with small
number of input files(e.g. 5 input files), but it would always failed in
the reduce phase if there are more than 20 input files.

I have checked the log, and find some strange logs:

*----JobTracker Log----*
2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.JobInProgress: Task
'attempt_201209241945_0001_m_000019_0' has completed
task_201209241945_0001_m_000019 successfully.
2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.ResourceEstimator:
completedMapsUpdates:20  completedMapsInputSize:28649108
 completedMapsOutputSize:41385220
2012-09-24 19:49:14,668 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201209241945_0001_r_000000_0:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
could only be replicated to 0 nodes, instead of 1
 at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
 at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
 at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)

at org.apache.hadoop.ipc.Client.call(Client.java:740)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy1.addBlock(Unknown Source)
 at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
 at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
 at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
 at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)


*----DataNode log----*
2012-09-24 19:49:08,156 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-8681085577211416125_16056 src: /127.0.0.1:39500 dest: /127.0.0.1:50010
2012-09-24 19:49:08,223 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_5641670145666195897_16057 src: /127.0.0.1:39501 dest: /127.0.0.1:50010
2012-09-24 19:49:08,324 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_2433781882701878319_16058 src: /127.0.0.1:39504 dest: /127.0.0.1:50010
2012-09-24 19:49:08,389 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_7063241539808048732_16059 src: /127.0.0.1:39505 dest: /127.0.0.1:50010
2012-09-24 19:49:08,465 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
blk_-8805810864714094811_16060 src: /127.0.0.1:39506 dest: /127.0.0.1:50010
2012-09-24 19:49:09,850 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
127.0.0.1:39463, dest: /127.0.0.1:50010, bytes: 513321, op: HDFS_WRITE,
cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
DS-373612585-127.0.1.1-50010-1347885793411, blockid:
blk_5143138923881025306_16019
2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
PacketResponder 0 for block blk_5143138923881025306_16019 terminating
2012-09-24 19:49:09,941 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
127.0.0.1:39464, dest: /127.0.0.1:50010, bytes: 483928, op: HDFS_WRITE,
cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
DS-373612585-127.0.1.1-50010-1347885793411, blockid:
blk_1512576479762656406_16020
2012-09-24 19:49:09,944 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for
block blk_1512576479762656406_16020 terminating
2012-09-24 19:49:09,975 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
127.0.0.1:39465, dest: /127.0.0.1:50010, bytes: 496747, op: HDFS_WRITE,
cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
DS-373612585-127.0.1.1-50010-1347885793411, blockid:
blk_8640569083674549899_16021

*----NameNode Log----*
2012-09-24 19:49:09,836 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=yanglin,yanglin,adm,dialout,cdrom,plugdev,lpadmin,admin,sambashare ip=/
127.0.0.1 cmd=create
src=/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
dst=null perm=yanglin:supergroup:rw-r--r--
2012-09-24 19:49:09,845
WARNorg.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to
place
enough replicas, still in need of 1
2012-09-24 19:49:09,845 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 8020, call
addBlock(/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2,
DFSClient_attempt_201209241945_0001_r_000000_0) from
127.0.0.1:37972:error: java.io.IOException: File
/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
could only be replicated to 0 nodes, instead of 1
java.io.IOException: File
/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
could only be replicated to 0 nodes, instead of 1
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
 at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to
blk_5143138923881025306_16019 size 513321
2012-09-24 19:49:09,852 INFO org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.completeFile: file
/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-000000-0
is closed by DFSClient_attempt_201209241945_0001_r_000000_0


*----taskTracker log----*
2012-09-24 19:49:00,813 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
MB/s) >
2012-09-24 19:49:02,451 INFO org.apache.hadoop.mapred.TaskTracker: Sent out
1873109 bytes for reduce: 0 from map: attempt_201209241945_0001_m_000018_0
given 1873109/1873105
2012-09-24 19:49:02,452 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
dest: 127.0.0.1:48717, bytes: 1873109, op: MAPRED_SHUFFLE, cliID:
attempt_201209241945_0001_m_000018_0
2012-09-24 19:49:02,468 INFO org.apache.hadoop.mapred.TaskTracker: Sent out
1328490 bytes for reduce: 0 from map: attempt_201209241945_0001_m_000019_0
given 1328490/1328486
2012-09-24 19:49:02,469 INFO
org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
dest: 127.0.0.1:48717, bytes: 1328490, op: MAPRED_SHUFFLE, cliID:
attempt_201209241945_0001_m_000019_0
2012-09-24 19:49:03,054 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
MB/s) >
2012-09-24 19:49:03,641 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
MB/s) >
2012-09-24 19:49:06,828 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201209241945_0001_r_000000_0 0.7520079% reduce > reduce
2012-09-24 19:49:09,830 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
2012-09-24 19:49:12,445 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
2012-09-24 19:49:12,538 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201209241945_0001_r_1599614869 exited. Number of tasks it ran: 0
2012-09-24 19:49:15,538 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_201209241945_0001_r_000000_0 done; removing files.
2012-09-24 19:49:15,558 INFO org.apache.hadoop.mapred.TaskTracker:
addFreeSlot : current free slots : 2
2012-09-24 19:49:17,677 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction (registerTask): attempt_201209241945_0001_r_000000_1
task's state:UNASSIGNED
2012-09-24 19:49:17,678 INFO org.apache.hadoop.mapred.TaskTracker: Trying
to launch : attempt_201209241945_0001_r_000000_1

In case it concern, here is my developing environment:
Hadoop Version: 0.20.2
OS: Ubuntu 12.04 LTS

*yanglin@ubuntu:~$ ulimit -a*
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 32059
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 10240
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 20480
cpu time               (seconds, -t) unlimited
max user processes              (-u) 32059
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

--
YANG, Lin

Re: Not able to place enough replicas in Reduce

Posted by Jason Yang <li...@gmail.com>.

Hey, Bertrand

here is the situation of my pseudo-distributed cluster:
-----
NameNode 'localhost:8020'
 Started: Mon Sep 24 03:11:05 CST 2012 Version: 0.20.2, r911707 Compiled: Fri
Feb 19 08:07:34 UTC 2010 by chrisdo Upgrades: There are no upgrades in
progress.

*Browse the filesystem <http://localhost:50070/nn_browsedfscontent.jsp>*
*Namenode Logs <http://localhost:50070/logs/>*
------------------------------
Cluster Summary***169 files and directories, 292 blocks = 461 total. Heap
Size is 54.5 MB / 888.94 MB (6%)
*
 Configured Capacity : 18.98 GB DFS Used : 56.7 MB Non DFS Used :
14.61 GBDFS Remaining:4.32 GB DFS
Used% : 0.29 % DFS Remaining% : 22.75 % Live
Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>
: 1 Dead Nodes <http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>  : 0

------------------------------
NameNode Storage:
*Storage Directory**Type**State*
/home/yanglin/Programs/hadoop_tmp_dir/hadoop-yanglin/dfs/nameIMAGE_AND_EDITSActive


and here is the configuration :
----
<!-- core-site.xml -->
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost</value>
</property>
<property>
  <name>hadoop.tmp.dir</name>
    <value>/home/yanglin/Programs/hadoop_tmp_dir/hadoop-${user.name}
    </value>
  </property>
</configuration>

<!-- hdfs-site.xml -->
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

<!-- mapred-site.xml -->
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:8021</value>
  </property>
</configuration>

yanglin@ubuntu:~$ cat /home/yanglin/Programs/hadoop-0.20.2/conf/masters
localhost
yanglin@ubuntu:~$ cat /home/yanglin/Programs/hadoop-0.20.2/conf/slaves
localhost

-----

2012/9/24 Bertrand Dechoux <de...@gmail.com>

> And do you have any remaining space in your HDFS? (Or do you have quota?
> But the message should be different, I guess.)
> What are the metrics you get from the namenode? Are all datanodes (you
> have only one?) live?
>
> http://localhost:50070/dfshealth.jsp
>
> As long as you consume (map) you don't need much space in HDFS but when
> you produce (reduce) you definitely need some.
> As Ted pointed out, your error is a standard one when hadoop is unable to
> replicate a block. It should not be related to the reduce itself and even
> less related about your specific logic.
>
> Regards
>
> Bertrand
>
>
> On Mon, Sep 24, 2012 at 5:41 PM, Jason Yang <li...@gmail.com>wrote:
>
>> Hi, Ted
>>
>> here is the result of jps:
>> yanglin@ubuntu:~$ jps
>> 3286 TaskTracker
>> 14053 Jps
>> 2623 DataNode
>> 2996 JobTracker
>> 2329 NameNode
>> 2925 SecondaryNameNode
>> ---
>> It seems that the DN is working.
>>
>> And it is not failed immediately when enter the reduce phase, actually it
>> always failed after processing some data
>>
>>
>> 2012/9/24 Steve Loughran <st...@hortonworks.com>
>>
>>>
>>>
>>> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>>>
>>>> Jason,
>>>>
>>>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>>>> instead of 1" points to a problem with your data node.  I generally means
>>>> that your DataNode is either down or not functioning correctly.  What is
>>>> the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).
>>>>
>>>>
>>>
>>> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>>>
>>> -steve
>>>
>>
>>
>>
>> --
>> YANG, Lin
>>
>>
>
>
> --
> Bertrand Dechoux
>



-- 
YANG, Lin

Re: Not able to place enough replicas in Reduce

Posted by Jason Yang <li...@gmail.com>.

Hey, Bertrand

here is the situation of my pseudo-distributed cluster:
-----
NameNode 'localhost:8020'
 Started: Mon Sep 24 03:11:05 CST 2012 Version: 0.20.2, r911707 Compiled: Fri
Feb 19 08:07:34 UTC 2010 by chrisdo Upgrades: There are no upgrades in
progress.

*Browse the filesystem <http://localhost:50070/nn_browsedfscontent.jsp>*
*Namenode Logs <http://localhost:50070/logs/>*
------------------------------
Cluster Summary***169 files and directories, 292 blocks = 461 total. Heap
Size is 54.5 MB / 888.94 MB (6%)
*
 Configured Capacity : 18.98 GB DFS Used : 56.7 MB Non DFS Used :
14.61 GBDFS Remaining:4.32 GB DFS
Used% : 0.29 % DFS Remaining% : 22.75 % Live
Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>
: 1 Dead Nodes <http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>  : 0

------------------------------
NameNode Storage:
*Storage Directory**Type**State*
/home/yanglin/Programs/hadoop_tmp_dir/hadoop-yanglin/dfs/nameIMAGE_AND_EDITSActive


and here is the configuration :
----
<!-- core-site.xml -->
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost</value>
</property>
<property>
  <name>hadoop.tmp.dir</name>
    <value>/home/yanglin/Programs/hadoop_tmp_dir/hadoop-${user.name}
    </value>
  </property>
</configuration>

<!-- hdfs-site.xml -->
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

<!-- mapred-site.xml -->
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:8021</value>
  </property>
</configuration>

yanglin@ubuntu:~$ cat /home/yanglin/Programs/hadoop-0.20.2/conf/masters
localhost
yanglin@ubuntu:~$ cat /home/yanglin/Programs/hadoop-0.20.2/conf/slaves
localhost

-----

2012/9/24 Bertrand Dechoux <de...@gmail.com>

> And do you have any remaining space in your HDFS? (Or do you have quota?
> But the message should be different, I guess.)
> What are the metrics you get from the namenode? Are all datanodes (you
> have only one?) live?
>
> http://localhost:50070/dfshealth.jsp
>
> As long as you consume (map) you don't need much space in HDFS but when
> you produce (reduce) you definitely need some.
> As Ted pointed out, your error is a standard one when hadoop is unable to
> replicate a block. It should not be related to the reduce itself and even
> less related about your specific logic.
>
> Regards
>
> Bertrand
>
>
> On Mon, Sep 24, 2012 at 5:41 PM, Jason Yang <li...@gmail.com>wrote:
>
>> Hi, Ted
>>
>> here is the result of jps:
>> yanglin@ubuntu:~$ jps
>> 3286 TaskTracker
>> 14053 Jps
>> 2623 DataNode
>> 2996 JobTracker
>> 2329 NameNode
>> 2925 SecondaryNameNode
>> ---
>> It seems that the DN is working.
>>
>> And it is not failed immediately when enter the reduce phase, actually it
>> always failed after processing some data
>>
>>
>> 2012/9/24 Steve Loughran <st...@hortonworks.com>
>>
>>>
>>>
>>> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>>>
>>>> Jason,
>>>>
>>>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>>>> instead of 1" points to a problem with your data node.  I generally means
>>>> that your DataNode is either down or not functioning correctly.  What is
>>>> the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).
>>>>
>>>>
>>>
>>> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>>>
>>> -steve
>>>
>>
>>
>>
>> --
>> YANG, Lin
>>
>>
>
>
> --
> Bertrand Dechoux
>



-- 
YANG, Lin

Re: Not able to place enough replicas in Reduce

Posted by Jason Yang <li...@gmail.com>.

Hey, Bertrand

here is the situation of my pseudo-distributed cluster:
-----
NameNode 'localhost:8020'
 Started: Mon Sep 24 03:11:05 CST 2012 Version: 0.20.2, r911707 Compiled: Fri
Feb 19 08:07:34 UTC 2010 by chrisdo Upgrades: There are no upgrades in
progress.

*Browse the filesystem <http://localhost:50070/nn_browsedfscontent.jsp>*
*Namenode Logs <http://localhost:50070/logs/>*
------------------------------
Cluster Summary***169 files and directories, 292 blocks = 461 total. Heap
Size is 54.5 MB / 888.94 MB (6%)
*
 Configured Capacity : 18.98 GB DFS Used : 56.7 MB Non DFS Used :
14.61 GBDFS Remaining:4.32 GB DFS
Used% : 0.29 % DFS Remaining% : 22.75 % Live
Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>
: 1 Dead Nodes <http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>  : 0

------------------------------
NameNode Storage:
*Storage Directory**Type**State*
/home/yanglin/Programs/hadoop_tmp_dir/hadoop-yanglin/dfs/nameIMAGE_AND_EDITSActive


and here is the configuration :
----
<!-- core-site.xml -->
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost</value>
</property>
<property>
  <name>hadoop.tmp.dir</name>
    <value>/home/yanglin/Programs/hadoop_tmp_dir/hadoop-${user.name}
    </value>
  </property>
</configuration>

<!-- hdfs-site.xml -->
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

<!-- mapred-site.xml -->
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:8021</value>
  </property>
</configuration>

yanglin@ubuntu:~$ cat /home/yanglin/Programs/hadoop-0.20.2/conf/masters
localhost
yanglin@ubuntu:~$ cat /home/yanglin/Programs/hadoop-0.20.2/conf/slaves
localhost

-----

2012/9/24 Bertrand Dechoux <de...@gmail.com>

> And do you have any remaining space in your HDFS? (Or do you have quota?
> But the message should be different, I guess.)
> What are the metrics you get from the namenode? Are all datanodes (you
> have only one?) live?
>
> http://localhost:50070/dfshealth.jsp
>
> As long as you consume (map) you don't need much space in HDFS but when
> you produce (reduce) you definitely need some.
> As Ted pointed out, your error is a standard one when hadoop is unable to
> replicate a block. It should not be related to the reduce itself and even
> less related about your specific logic.
>
> Regards
>
> Bertrand
>
>
> On Mon, Sep 24, 2012 at 5:41 PM, Jason Yang <li...@gmail.com>wrote:
>
>> Hi, Ted
>>
>> here is the result of jps:
>> yanglin@ubuntu:~$ jps
>> 3286 TaskTracker
>> 14053 Jps
>> 2623 DataNode
>> 2996 JobTracker
>> 2329 NameNode
>> 2925 SecondaryNameNode
>> ---
>> It seems that the DN is working.
>>
>> And it is not failed immediately when enter the reduce phase, actually it
>> always failed after processing some data
>>
>>
>> 2012/9/24 Steve Loughran <st...@hortonworks.com>
>>
>>>
>>>
>>> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>>>
>>>> Jason,
>>>>
>>>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>>>> instead of 1" points to a problem with your data node.  I generally means
>>>> that your DataNode is either down or not functioning correctly.  What is
>>>> the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).
>>>>
>>>>
>>>
>>> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>>>
>>> -steve
>>>
>>
>>
>>
>> --
>> YANG, Lin
>>
>>
>
>
> --
> Bertrand Dechoux
>



-- 
YANG, Lin

Re: Not able to place enough replicas in Reduce

Posted by Jason Yang <li...@gmail.com>.

Hey, Bertrand

here is the situation of my pseudo-distributed cluster:
-----
NameNode 'localhost:8020'
 Started: Mon Sep 24 03:11:05 CST 2012 Version: 0.20.2, r911707 Compiled: Fri
Feb 19 08:07:34 UTC 2010 by chrisdo Upgrades: There are no upgrades in
progress.

*Browse the filesystem <http://localhost:50070/nn_browsedfscontent.jsp>*
*Namenode Logs <http://localhost:50070/logs/>*
------------------------------
Cluster Summary***169 files and directories, 292 blocks = 461 total. Heap
Size is 54.5 MB / 888.94 MB (6%)
*
 Configured Capacity : 18.98 GB DFS Used : 56.7 MB Non DFS Used :
14.61 GBDFS Remaining:4.32 GB DFS
Used% : 0.29 % DFS Remaining% : 22.75 % Live
Nodes<http://localhost:50070/dfsnodelist.jsp?whatNodes=LIVE>
: 1 Dead Nodes <http://localhost:50070/dfsnodelist.jsp?whatNodes=DEAD>  : 0

------------------------------
NameNode Storage:
*Storage Directory**Type**State*
/home/yanglin/Programs/hadoop_tmp_dir/hadoop-yanglin/dfs/nameIMAGE_AND_EDITSActive


and here is the configuration :
----
<!-- core-site.xml -->
<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost</value>
</property>
<property>
  <name>hadoop.tmp.dir</name>
    <value>/home/yanglin/Programs/hadoop_tmp_dir/hadoop-${user.name}
    </value>
  </property>
</configuration>

<!-- hdfs-site.xml -->
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

<!-- mapred-site.xml -->
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:8021</value>
  </property>
</configuration>

yanglin@ubuntu:~$ cat /home/yanglin/Programs/hadoop-0.20.2/conf/masters
localhost
yanglin@ubuntu:~$ cat /home/yanglin/Programs/hadoop-0.20.2/conf/slaves
localhost

-----

2012/9/24 Bertrand Dechoux <de...@gmail.com>

> And do you have any remaining space in your HDFS? (Or do you have quota?
> But the message should be different, I guess.)
> What are the metrics you get from the namenode? Are all datanodes (you
> have only one?) live?
>
> http://localhost:50070/dfshealth.jsp
>
> As long as you consume (map) you don't need much space in HDFS but when
> you produce (reduce) you definitely need some.
> As Ted pointed out, your error is a standard one when hadoop is unable to
> replicate a block. It should not be related to the reduce itself and even
> less related about your specific logic.
>
> Regards
>
> Bertrand
>
>
> On Mon, Sep 24, 2012 at 5:41 PM, Jason Yang <li...@gmail.com>wrote:
>
>> Hi, Ted
>>
>> here is the result of jps:
>> yanglin@ubuntu:~$ jps
>> 3286 TaskTracker
>> 14053 Jps
>> 2623 DataNode
>> 2996 JobTracker
>> 2329 NameNode
>> 2925 SecondaryNameNode
>> ---
>> It seems that the DN is working.
>>
>> And it is not failed immediately when enter the reduce phase, actually it
>> always failed after processing some data
>>
>>
>> 2012/9/24 Steve Loughran <st...@hortonworks.com>
>>
>>>
>>>
>>> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>>>
>>>> Jason,
>>>>
>>>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>>>> instead of 1" points to a problem with your data node.  I generally means
>>>> that your DataNode is either down or not functioning correctly.  What is
>>>> the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).
>>>>
>>>>
>>>
>>> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>>>
>>> -steve
>>>
>>
>>
>>
>> --
>> YANG, Lin
>>
>>
>
>
> --
> Bertrand Dechoux
>



-- 
YANG, Lin

Re: Not able to place enough replicas in Reduce

Posted by Bertrand Dechoux <de...@gmail.com>.

And do you have any remaining space in your HDFS? (Or do you have quota?
But the message should be different, I guess.)
What are the metrics you get from the namenode? Are all datanodes (you have
only one?) live?

http://localhost:50070/dfshealth.jsp

As long as you consume (map) you don't need much space in HDFS but when you
produce (reduce) you definitely need some.
As Ted pointed out, your error is a standard one when hadoop is unable to
replicate a block. It should not be related to the reduce itself and even
less related about your specific logic.

Regards

Bertrand

On Mon, Sep 24, 2012 at 5:41 PM, Jason Yang <li...@gmail.com>wrote:

> Hi, Ted
>
> here is the result of jps:
> yanglin@ubuntu:~$ jps
> 3286 TaskTracker
> 14053 Jps
> 2623 DataNode
> 2996 JobTracker
> 2329 NameNode
> 2925 SecondaryNameNode
> ---
> It seems that the DN is working.
>
> And it is not failed immediately when enter the reduce phase, actually it
> always failed after processing some data
>
>
> 2012/9/24 Steve Loughran <st...@hortonworks.com>
>
>>
>>
>> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>>
>>> Jason,
>>>
>>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>>> instead of 1" points to a problem with your data node.  I generally means
>>> that your DataNode is either down or not functioning correctly.  What is
>>> the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).
>>>
>>>
>>
>> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>>
>> -steve
>>
>
>
>
> --
> YANG, Lin
>
>

-- 
Bertrand Dechoux

Re: Not able to place enough replicas in Reduce

Posted by Bertrand Dechoux <de...@gmail.com>.

And do you have any remaining space in your HDFS? (Or do you have quota?
But the message should be different, I guess.)
What are the metrics you get from the namenode? Are all datanodes (you have
only one?) live?

http://localhost:50070/dfshealth.jsp

As long as you consume (map) you don't need much space in HDFS but when you
produce (reduce) you definitely need some.
As Ted pointed out, your error is a standard one when hadoop is unable to
replicate a block. It should not be related to the reduce itself and even
less related about your specific logic.

Regards

Bertrand

On Mon, Sep 24, 2012 at 5:41 PM, Jason Yang <li...@gmail.com>wrote:

> Hi, Ted
>
> here is the result of jps:
> yanglin@ubuntu:~$ jps
> 3286 TaskTracker
> 14053 Jps
> 2623 DataNode
> 2996 JobTracker
> 2329 NameNode
> 2925 SecondaryNameNode
> ---
> It seems that the DN is working.
>
> And it is not failed immediately when enter the reduce phase, actually it
> always failed after processing some data
>
>
> 2012/9/24 Steve Loughran <st...@hortonworks.com>
>
>>
>>
>> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>>
>>> Jason,
>>>
>>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>>> instead of 1" points to a problem with your data node.  I generally means
>>> that your DataNode is either down or not functioning correctly.  What is
>>> the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).
>>>
>>>
>>
>> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>>
>> -steve
>>
>
>
>
> --
> YANG, Lin
>
>

-- 
Bertrand Dechoux

Re: Not able to place enough replicas in Reduce

Posted by Bertrand Dechoux <de...@gmail.com>.

And do you have any remaining space in your HDFS? (Or do you have quota?
But the message should be different, I guess.)
What are the metrics you get from the namenode? Are all datanodes (you have
only one?) live?

http://localhost:50070/dfshealth.jsp

As long as you consume (map) you don't need much space in HDFS but when you
produce (reduce) you definitely need some.
As Ted pointed out, your error is a standard one when hadoop is unable to
replicate a block. It should not be related to the reduce itself and even
less related about your specific logic.

Regards

Bertrand

On Mon, Sep 24, 2012 at 5:41 PM, Jason Yang <li...@gmail.com>wrote:

> Hi, Ted
>
> here is the result of jps:
> yanglin@ubuntu:~$ jps
> 3286 TaskTracker
> 14053 Jps
> 2623 DataNode
> 2996 JobTracker
> 2329 NameNode
> 2925 SecondaryNameNode
> ---
> It seems that the DN is working.
>
> And it is not failed immediately when enter the reduce phase, actually it
> always failed after processing some data
>
>
> 2012/9/24 Steve Loughran <st...@hortonworks.com>
>
>>
>>
>> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>>
>>> Jason,
>>>
>>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>>> instead of 1" points to a problem with your data node.  I generally means
>>> that your DataNode is either down or not functioning correctly.  What is
>>> the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).
>>>
>>>
>>
>> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>>
>> -steve
>>
>
>
>
> --
> YANG, Lin
>
>

-- 
Bertrand Dechoux

Re: Not able to place enough replicas in Reduce

Posted by Bertrand Dechoux <de...@gmail.com>.

And do you have any remaining space in your HDFS? (Or do you have quota?
But the message should be different, I guess.)
What are the metrics you get from the namenode? Are all datanodes (you have
only one?) live?

http://localhost:50070/dfshealth.jsp

As long as you consume (map) you don't need much space in HDFS but when you
produce (reduce) you definitely need some.
As Ted pointed out, your error is a standard one when hadoop is unable to
replicate a block. It should not be related to the reduce itself and even
less related about your specific logic.

Regards

Bertrand

On Mon, Sep 24, 2012 at 5:41 PM, Jason Yang <li...@gmail.com>wrote:

> Hi, Ted
>
> here is the result of jps:
> yanglin@ubuntu:~$ jps
> 3286 TaskTracker
> 14053 Jps
> 2623 DataNode
> 2996 JobTracker
> 2329 NameNode
> 2925 SecondaryNameNode
> ---
> It seems that the DN is working.
>
> And it is not failed immediately when enter the reduce phase, actually it
> always failed after processing some data
>
>
> 2012/9/24 Steve Loughran <st...@hortonworks.com>
>
>>
>>
>> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>>
>>> Jason,
>>>
>>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>>> instead of 1" points to a problem with your data node.  I generally means
>>> that your DataNode is either down or not functioning correctly.  What is
>>> the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).
>>>
>>>
>>
>> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>>
>> -steve
>>
>
>
>
> --
> YANG, Lin
>
>

-- 
Bertrand Dechoux

Re: Not able to place enough replicas in Reduce

Posted by Jason Yang <li...@gmail.com>.

Hi, Ted

here is the result of jps:
yanglin@ubuntu:~$ jps
3286 TaskTracker
14053 Jps
2623 DataNode
2996 JobTracker
2329 NameNode
2925 SecondaryNameNode
---
It seems that the DN is working.

And it is not failed immediately when enter the reduce phase, actually it
always failed after processing some data


2012/9/24 Steve Loughran <st...@hortonworks.com>

>
>
> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>
>> Jason,
>>
>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>> instead of 1" points to a problem with your data node.  I generally means
>> that your DataNode is either down or not functioning correctly.  What is
>> the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).
>>
>>
>
> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>
> -steve
>



-- 
YANG, Lin

Re: Not able to place enough replicas in Reduce

Posted by Jason Yang <li...@gmail.com>.

Hi, Ted

here is the result of jps:
yanglin@ubuntu:~$ jps
3286 TaskTracker
14053 Jps
2623 DataNode
2996 JobTracker
2329 NameNode
2925 SecondaryNameNode
---
It seems that the DN is working.

And it is not failed immediately when enter the reduce phase, actually it
always failed after processing some data


2012/9/24 Steve Loughran <st...@hortonworks.com>

>
>
> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>
>> Jason,
>>
>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>> instead of 1" points to a problem with your data node.  I generally means
>> that your DataNode is either down or not functioning correctly.  What is
>> the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).
>>
>>
>
> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>
> -steve
>



-- 
YANG, Lin

Re: Not able to place enough replicas in Reduce

Posted by Jason Yang <li...@gmail.com>.

Hi, Ted

here is the result of jps:
yanglin@ubuntu:~$ jps
3286 TaskTracker
14053 Jps
2623 DataNode
2996 JobTracker
2329 NameNode
2925 SecondaryNameNode
---
It seems that the DN is working.

And it is not failed immediately when enter the reduce phase, actually it
always failed after processing some data


2012/9/24 Steve Loughran <st...@hortonworks.com>

>
>
> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>
>> Jason,
>>
>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>> instead of 1" points to a problem with your data node.  I generally means
>> that your DataNode is either down or not functioning correctly.  What is
>> the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).
>>
>>
>
> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>
> -steve
>



-- 
YANG, Lin

Re: Not able to place enough replicas in Reduce

Posted by Jason Yang <li...@gmail.com>.

Hi, Ted

here is the result of jps:
yanglin@ubuntu:~$ jps
3286 TaskTracker
14053 Jps
2623 DataNode
2996 JobTracker
2329 NameNode
2925 SecondaryNameNode
---
It seems that the DN is working.

And it is not failed immediately when enter the reduce phase, actually it
always failed after processing some data


2012/9/24 Steve Loughran <st...@hortonworks.com>

>
>
> On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:
>
>> Jason,
>>
>> The line in the JobTracker log - "Could only be replicated to 0 nodes,
>> instead of 1" points to a problem with your data node.  I generally means
>> that your DataNode is either down or not functioning correctly.  What is
>> the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).
>>
>>
>
> see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo
>
> -steve
>



-- 
YANG, Lin

Re: Not able to place enough replicas in Reduce

Posted by Steve Loughran <st...@hortonworks.com>.

On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:

> Jason,
>
> The line in the JobTracker log - "Could only be replicated to 0 nodes,
> instead of 1" points to a problem with your data node.  I generally means
> that your DataNode is either down or not functioning correctly.  What is
> the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).
>
>

see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo

-steve

Re: Not able to place enough replicas in Reduce

Posted by Steve Loughran <st...@hortonworks.com>.

On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:

> Jason,
>
> The line in the JobTracker log - "Could only be replicated to 0 nodes,
> instead of 1" points to a problem with your data node.  I generally means
> that your DataNode is either down or not functioning correctly.  What is
> the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).
>
>

see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo

-steve

Re: Not able to place enough replicas in Reduce

Posted by Steve Loughran <st...@hortonworks.com>.

On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:

> Jason,
>
> The line in the JobTracker log - "Could only be replicated to 0 nodes,
> instead of 1" points to a problem with your data node.  I generally means
> that your DataNode is either down or not functioning correctly.  What is
> the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).
>
>

see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo

-steve

Re: Not able to place enough replicas in Reduce

Posted by Steve Loughran <st...@hortonworks.com>.

On 24 September 2012 15:47, Ted Reynolds <te...@hortonworks.com> wrote:

> Jason,
>
> The line in the JobTracker log - "Could only be replicated to 0 nodes,
> instead of 1" points to a problem with your data node.  I generally means
> that your DataNode is either down or not functioning correctly.  What is
> the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).
>
>

see also: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo

-steve

Re: Not able to place enough replicas in Reduce

Posted by Ted Reynolds <te...@hortonworks.com>.

Jason,

The line in the JobTracker log - "Could only be replicated to 0 nodes,
instead of 1" points to a problem with your data node.  I generally means
that your DataNode is either down or not functioning correctly.  What is
the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).

Ted Reynolds
Technical Support Engineer
Hortonworks
Work Phone: 408-645-7079

<http://hortonworks.com/download/>



On Mon, Sep 24, 2012 at 6:25 AM, Jason Yang <li...@gmail.com>wrote:

> Hi, all
>
> I have been stuck by a weird problem for a long time. so I was wondering
> could anyone give me some advise?
>
> I have a MapReduce Job , in which:
>
> 1. the mapper would read a whole file as a split (by overriding
> FileInputFormat.isSplitable ) and translate it into a set of points, and
> then cluster these points into 4 groups randomly. The output of mapper is
> <splitID_GroupID, point>. so If I have 20 input files, then there will be
> 20*4=80 unique keys in total.
>
> 2. I have implemented a KeyasNameMSOutputFormat which return file name
> according to the key:
>
>> public class KeyasNameMSFOutputFormat
>> extends MultipleSequenceFileOutputFormat<Text, point>
>> {
>> protected String generateFileNameForKeyValue(Text key,
>>  Point value, String name)
>> {
>> return key.toString();
>>  }
>> }
>
>
>  3. Use the IdentityReducer. and the jobConf is like:
>
>> // set path for input and output
>> ...
>>
>> // set format for input and output
>> jobConf.setInputFormat(WholeFileInputFormat.class);
>> jobConf.setOutputFormat(KeyasNameMSFOutputFormat.class);
>>
>> // set class of output key and value
>> jobConf.setOutputKeyClass(Text.class);
>> jobConf.setOutputValueClass(Point.class);
>>
>> // set mapper and reducer
>> jobConf.setMapperClass(LocalClusteringMapper.class);
>> jobConf.setReducerClass(IdentityReducer.class);
>>
>> // run the job
>> JobClient.runJob(jobConf);
>
>
> On the pseudo-distributed hadoop cluster, this job runs fine with small
> number of input files(e.g. 5 input files), but it would always failed in
> the reduce phase if there are more than 20 input files.
>
> I have checked the log, and find some strange logs:
>
> *----JobTracker Log----*
> 2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201209241945_0001_m_000019_0' has completed
> task_201209241945_0001_m_000019 successfully.
> 2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.ResourceEstimator:
> completedMapsUpdates:20  completedMapsInputSize:28649108
>  completedMapsOutputSize:41385220
> 2012-09-24 19:49:14,668 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201209241945_0001_r_000000_0:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
>  at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>  at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:740)
>  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> at $Proxy1.addBlock(Unknown Source)
>  at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>  at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy1.addBlock(Unknown Source)
>  at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
>  at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>
>
> *----DataNode log----*
> 2012-09-24 19:49:08,156 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8681085577211416125_16056 src: /127.0.0.1:39500 dest: /
> 127.0.0.1:50010
> 2012-09-24 19:49:08,223 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_5641670145666195897_16057 src: /127.0.0.1:39501 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,324 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_2433781882701878319_16058 src: /127.0.0.1:39504 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,389 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_7063241539808048732_16059 src: /127.0.0.1:39505 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,465 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8805810864714094811_16060 src: /127.0.0.1:39506 dest: /
> 127.0.0.1:50010
> 2012-09-24 19:49:09,850 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39463, dest: /127.0.0.1:50010, bytes: 513321, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_5143138923881025306_16019
> 2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> PacketResponder 0 for block blk_5143138923881025306_16019 terminating
> 2012-09-24 19:49:09,941 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39464, dest: /127.0.0.1:50010, bytes: 483928, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_1512576479762656406_16020
> 2012-09-24 19:49:09,944 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for
> block blk_1512576479762656406_16020 terminating
> 2012-09-24 19:49:09,975 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39465, dest: /127.0.0.1:50010, bytes: 496747, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_8640569083674549899_16021
>
> *----NameNode Log----*
> 2012-09-24 19:49:09,836 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=yanglin,yanglin,adm,dialout,cdrom,plugdev,lpadmin,admin,sambashare
> ip=/127.0.0.1 cmd=create
> src=/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> dst=null perm=yanglin:supergroup:rw-r--r--
> 2012-09-24 19:49:09,845 WARNorg.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place
> enough replicas, still in need of 1
> 2012-09-24 19:49:09,845 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 8020, call
> addBlock(/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2,
> DFSClient_attempt_201209241945_0001_r_000000_0) from 127.0.0.1:37972:error: java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
>  at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> 2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to
> blk_5143138923881025306_16019 size 513321
> 2012-09-24 19:49:09,852 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.completeFile: file
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-000000-0
> is closed by DFSClient_attempt_201209241945_0001_r_000000_0
>
>
> *----taskTracker log----*
> 2012-09-24 19:49:00,813 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:02,451 INFO org.apache.hadoop.mapred.TaskTracker: Sent
> out 1873109 bytes for reduce: 0 from map:
> attempt_201209241945_0001_m_000018_0 given 1873109/1873105
> 2012-09-24 19:49:02,452 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
> dest: 127.0.0.1:48717, bytes: 1873109, op: MAPRED_SHUFFLE, cliID:
> attempt_201209241945_0001_m_000018_0
> 2012-09-24 19:49:02,468 INFO org.apache.hadoop.mapred.TaskTracker: Sent
> out 1328490 bytes for reduce: 0 from map:
> attempt_201209241945_0001_m_000019_0 given 1328490/1328486
> 2012-09-24 19:49:02,469 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
> dest: 127.0.0.1:48717, bytes: 1328490, op: MAPRED_SHUFFLE, cliID:
> attempt_201209241945_0001_m_000019_0
> 2012-09-24 19:49:03,054 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:03,641 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:06,828 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.7520079% reduce > reduce
> 2012-09-24 19:49:09,830 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
> 2012-09-24 19:49:12,445 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
> 2012-09-24 19:49:12,538 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201209241945_0001_r_1599614869 exited. Number of tasks it ran: 0
> 2012-09-24 19:49:15,538 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_201209241945_0001_r_000000_0 done; removing files.
> 2012-09-24 19:49:15,558 INFO org.apache.hadoop.mapred.TaskTracker:
> addFreeSlot : current free slots : 2
> 2012-09-24 19:49:17,677 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_201209241945_0001_r_000000_1
> task's state:UNASSIGNED
> 2012-09-24 19:49:17,678 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> to launch : attempt_201209241945_0001_r_000000_1
>
> In case it concern, here is my developing environment:
> Hadoop Version: 0.20.2
> OS: Ubuntu 12.04 LTS
>
> *yanglin@ubuntu:~$ ulimit -a*
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 32059
> max locked memory       (kbytes, -l) 64
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 10240
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 20480
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 32059
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
>
> --
> YANG, Lin
>
>

Re: Not able to place enough replicas in Reduce

Posted by Ted Reynolds <te...@hortonworks.com>.

Jason,

The line in the JobTracker log - "Could only be replicated to 0 nodes,
instead of 1" points to a problem with your data node.  I generally means
that your DataNode is either down or not functioning correctly.  What is
the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).

Ted Reynolds
Technical Support Engineer
Hortonworks
Work Phone: 408-645-7079

<http://hortonworks.com/download/>



On Mon, Sep 24, 2012 at 6:25 AM, Jason Yang <li...@gmail.com>wrote:

> Hi, all
>
> I have been stuck by a weird problem for a long time. so I was wondering
> could anyone give me some advise?
>
> I have a MapReduce Job , in which:
>
> 1. the mapper would read a whole file as a split (by overriding
> FileInputFormat.isSplitable ) and translate it into a set of points, and
> then cluster these points into 4 groups randomly. The output of mapper is
> <splitID_GroupID, point>. so If I have 20 input files, then there will be
> 20*4=80 unique keys in total.
>
> 2. I have implemented a KeyasNameMSOutputFormat which return file name
> according to the key:
>
>> public class KeyasNameMSFOutputFormat
>> extends MultipleSequenceFileOutputFormat<Text, point>
>> {
>> protected String generateFileNameForKeyValue(Text key,
>>  Point value, String name)
>> {
>> return key.toString();
>>  }
>> }
>
>
>  3. Use the IdentityReducer. and the jobConf is like:
>
>> // set path for input and output
>> ...
>>
>> // set format for input and output
>> jobConf.setInputFormat(WholeFileInputFormat.class);
>> jobConf.setOutputFormat(KeyasNameMSFOutputFormat.class);
>>
>> // set class of output key and value
>> jobConf.setOutputKeyClass(Text.class);
>> jobConf.setOutputValueClass(Point.class);
>>
>> // set mapper and reducer
>> jobConf.setMapperClass(LocalClusteringMapper.class);
>> jobConf.setReducerClass(IdentityReducer.class);
>>
>> // run the job
>> JobClient.runJob(jobConf);
>
>
> On the pseudo-distributed hadoop cluster, this job runs fine with small
> number of input files(e.g. 5 input files), but it would always failed in
> the reduce phase if there are more than 20 input files.
>
> I have checked the log, and find some strange logs:
>
> *----JobTracker Log----*
> 2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201209241945_0001_m_000019_0' has completed
> task_201209241945_0001_m_000019 successfully.
> 2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.ResourceEstimator:
> completedMapsUpdates:20  completedMapsInputSize:28649108
>  completedMapsOutputSize:41385220
> 2012-09-24 19:49:14,668 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201209241945_0001_r_000000_0:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
>  at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>  at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:740)
>  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> at $Proxy1.addBlock(Unknown Source)
>  at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>  at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy1.addBlock(Unknown Source)
>  at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
>  at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>
>
> *----DataNode log----*
> 2012-09-24 19:49:08,156 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8681085577211416125_16056 src: /127.0.0.1:39500 dest: /
> 127.0.0.1:50010
> 2012-09-24 19:49:08,223 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_5641670145666195897_16057 src: /127.0.0.1:39501 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,324 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_2433781882701878319_16058 src: /127.0.0.1:39504 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,389 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_7063241539808048732_16059 src: /127.0.0.1:39505 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,465 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8805810864714094811_16060 src: /127.0.0.1:39506 dest: /
> 127.0.0.1:50010
> 2012-09-24 19:49:09,850 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39463, dest: /127.0.0.1:50010, bytes: 513321, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_5143138923881025306_16019
> 2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> PacketResponder 0 for block blk_5143138923881025306_16019 terminating
> 2012-09-24 19:49:09,941 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39464, dest: /127.0.0.1:50010, bytes: 483928, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_1512576479762656406_16020
> 2012-09-24 19:49:09,944 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for
> block blk_1512576479762656406_16020 terminating
> 2012-09-24 19:49:09,975 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39465, dest: /127.0.0.1:50010, bytes: 496747, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_8640569083674549899_16021
>
> *----NameNode Log----*
> 2012-09-24 19:49:09,836 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=yanglin,yanglin,adm,dialout,cdrom,plugdev,lpadmin,admin,sambashare
> ip=/127.0.0.1 cmd=create
> src=/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> dst=null perm=yanglin:supergroup:rw-r--r--
> 2012-09-24 19:49:09,845 WARNorg.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place
> enough replicas, still in need of 1
> 2012-09-24 19:49:09,845 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 8020, call
> addBlock(/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2,
> DFSClient_attempt_201209241945_0001_r_000000_0) from 127.0.0.1:37972:error: java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
>  at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> 2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to
> blk_5143138923881025306_16019 size 513321
> 2012-09-24 19:49:09,852 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.completeFile: file
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-000000-0
> is closed by DFSClient_attempt_201209241945_0001_r_000000_0
>
>
> *----taskTracker log----*
> 2012-09-24 19:49:00,813 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:02,451 INFO org.apache.hadoop.mapred.TaskTracker: Sent
> out 1873109 bytes for reduce: 0 from map:
> attempt_201209241945_0001_m_000018_0 given 1873109/1873105
> 2012-09-24 19:49:02,452 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
> dest: 127.0.0.1:48717, bytes: 1873109, op: MAPRED_SHUFFLE, cliID:
> attempt_201209241945_0001_m_000018_0
> 2012-09-24 19:49:02,468 INFO org.apache.hadoop.mapred.TaskTracker: Sent
> out 1328490 bytes for reduce: 0 from map:
> attempt_201209241945_0001_m_000019_0 given 1328490/1328486
> 2012-09-24 19:49:02,469 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
> dest: 127.0.0.1:48717, bytes: 1328490, op: MAPRED_SHUFFLE, cliID:
> attempt_201209241945_0001_m_000019_0
> 2012-09-24 19:49:03,054 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:03,641 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:06,828 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.7520079% reduce > reduce
> 2012-09-24 19:49:09,830 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
> 2012-09-24 19:49:12,445 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
> 2012-09-24 19:49:12,538 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201209241945_0001_r_1599614869 exited. Number of tasks it ran: 0
> 2012-09-24 19:49:15,538 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_201209241945_0001_r_000000_0 done; removing files.
> 2012-09-24 19:49:15,558 INFO org.apache.hadoop.mapred.TaskTracker:
> addFreeSlot : current free slots : 2
> 2012-09-24 19:49:17,677 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_201209241945_0001_r_000000_1
> task's state:UNASSIGNED
> 2012-09-24 19:49:17,678 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> to launch : attempt_201209241945_0001_r_000000_1
>
> In case it concern, here is my developing environment:
> Hadoop Version: 0.20.2
> OS: Ubuntu 12.04 LTS
>
> *yanglin@ubuntu:~$ ulimit -a*
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 32059
> max locked memory       (kbytes, -l) 64
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 10240
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 20480
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 32059
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
>
> --
> YANG, Lin
>
>

Re: Not able to place enough replicas in Reduce

Posted by Ted Reynolds <te...@hortonworks.com>.

Jason,

The line in the JobTracker log - "Could only be replicated to 0 nodes,
instead of 1" points to a problem with your data node.  I generally means
that your DataNode is either down or not functioning correctly.  What is
the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).

Ted Reynolds
Technical Support Engineer
Hortonworks
Work Phone: 408-645-7079

<http://hortonworks.com/download/>



On Mon, Sep 24, 2012 at 6:25 AM, Jason Yang <li...@gmail.com>wrote:

> Hi, all
>
> I have been stuck by a weird problem for a long time. so I was wondering
> could anyone give me some advise?
>
> I have a MapReduce Job , in which:
>
> 1. the mapper would read a whole file as a split (by overriding
> FileInputFormat.isSplitable ) and translate it into a set of points, and
> then cluster these points into 4 groups randomly. The output of mapper is
> <splitID_GroupID, point>. so If I have 20 input files, then there will be
> 20*4=80 unique keys in total.
>
> 2. I have implemented a KeyasNameMSOutputFormat which return file name
> according to the key:
>
>> public class KeyasNameMSFOutputFormat
>> extends MultipleSequenceFileOutputFormat<Text, point>
>> {
>> protected String generateFileNameForKeyValue(Text key,
>>  Point value, String name)
>> {
>> return key.toString();
>>  }
>> }
>
>
>  3. Use the IdentityReducer. and the jobConf is like:
>
>> // set path for input and output
>> ...
>>
>> // set format for input and output
>> jobConf.setInputFormat(WholeFileInputFormat.class);
>> jobConf.setOutputFormat(KeyasNameMSFOutputFormat.class);
>>
>> // set class of output key and value
>> jobConf.setOutputKeyClass(Text.class);
>> jobConf.setOutputValueClass(Point.class);
>>
>> // set mapper and reducer
>> jobConf.setMapperClass(LocalClusteringMapper.class);
>> jobConf.setReducerClass(IdentityReducer.class);
>>
>> // run the job
>> JobClient.runJob(jobConf);
>
>
> On the pseudo-distributed hadoop cluster, this job runs fine with small
> number of input files(e.g. 5 input files), but it would always failed in
> the reduce phase if there are more than 20 input files.
>
> I have checked the log, and find some strange logs:
>
> *----JobTracker Log----*
> 2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201209241945_0001_m_000019_0' has completed
> task_201209241945_0001_m_000019 successfully.
> 2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.ResourceEstimator:
> completedMapsUpdates:20  completedMapsInputSize:28649108
>  completedMapsOutputSize:41385220
> 2012-09-24 19:49:14,668 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201209241945_0001_r_000000_0:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
>  at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>  at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:740)
>  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> at $Proxy1.addBlock(Unknown Source)
>  at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>  at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy1.addBlock(Unknown Source)
>  at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
>  at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>
>
> *----DataNode log----*
> 2012-09-24 19:49:08,156 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8681085577211416125_16056 src: /127.0.0.1:39500 dest: /
> 127.0.0.1:50010
> 2012-09-24 19:49:08,223 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_5641670145666195897_16057 src: /127.0.0.1:39501 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,324 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_2433781882701878319_16058 src: /127.0.0.1:39504 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,389 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_7063241539808048732_16059 src: /127.0.0.1:39505 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,465 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8805810864714094811_16060 src: /127.0.0.1:39506 dest: /
> 127.0.0.1:50010
> 2012-09-24 19:49:09,850 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39463, dest: /127.0.0.1:50010, bytes: 513321, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_5143138923881025306_16019
> 2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> PacketResponder 0 for block blk_5143138923881025306_16019 terminating
> 2012-09-24 19:49:09,941 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39464, dest: /127.0.0.1:50010, bytes: 483928, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_1512576479762656406_16020
> 2012-09-24 19:49:09,944 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for
> block blk_1512576479762656406_16020 terminating
> 2012-09-24 19:49:09,975 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39465, dest: /127.0.0.1:50010, bytes: 496747, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_8640569083674549899_16021
>
> *----NameNode Log----*
> 2012-09-24 19:49:09,836 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=yanglin,yanglin,adm,dialout,cdrom,plugdev,lpadmin,admin,sambashare
> ip=/127.0.0.1 cmd=create
> src=/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> dst=null perm=yanglin:supergroup:rw-r--r--
> 2012-09-24 19:49:09,845 WARNorg.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place
> enough replicas, still in need of 1
> 2012-09-24 19:49:09,845 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 8020, call
> addBlock(/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2,
> DFSClient_attempt_201209241945_0001_r_000000_0) from 127.0.0.1:37972:error: java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
>  at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> 2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to
> blk_5143138923881025306_16019 size 513321
> 2012-09-24 19:49:09,852 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.completeFile: file
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-000000-0
> is closed by DFSClient_attempt_201209241945_0001_r_000000_0
>
>
> *----taskTracker log----*
> 2012-09-24 19:49:00,813 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:02,451 INFO org.apache.hadoop.mapred.TaskTracker: Sent
> out 1873109 bytes for reduce: 0 from map:
> attempt_201209241945_0001_m_000018_0 given 1873109/1873105
> 2012-09-24 19:49:02,452 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
> dest: 127.0.0.1:48717, bytes: 1873109, op: MAPRED_SHUFFLE, cliID:
> attempt_201209241945_0001_m_000018_0
> 2012-09-24 19:49:02,468 INFO org.apache.hadoop.mapred.TaskTracker: Sent
> out 1328490 bytes for reduce: 0 from map:
> attempt_201209241945_0001_m_000019_0 given 1328490/1328486
> 2012-09-24 19:49:02,469 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
> dest: 127.0.0.1:48717, bytes: 1328490, op: MAPRED_SHUFFLE, cliID:
> attempt_201209241945_0001_m_000019_0
> 2012-09-24 19:49:03,054 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:03,641 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:06,828 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.7520079% reduce > reduce
> 2012-09-24 19:49:09,830 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
> 2012-09-24 19:49:12,445 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
> 2012-09-24 19:49:12,538 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201209241945_0001_r_1599614869 exited. Number of tasks it ran: 0
> 2012-09-24 19:49:15,538 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_201209241945_0001_r_000000_0 done; removing files.
> 2012-09-24 19:49:15,558 INFO org.apache.hadoop.mapred.TaskTracker:
> addFreeSlot : current free slots : 2
> 2012-09-24 19:49:17,677 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_201209241945_0001_r_000000_1
> task's state:UNASSIGNED
> 2012-09-24 19:49:17,678 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> to launch : attempt_201209241945_0001_r_000000_1
>
> In case it concern, here is my developing environment:
> Hadoop Version: 0.20.2
> OS: Ubuntu 12.04 LTS
>
> *yanglin@ubuntu:~$ ulimit -a*
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 32059
> max locked memory       (kbytes, -l) 64
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 10240
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 20480
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 32059
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
>
> --
> YANG, Lin
>
>

Re: Not able to place enough replicas in Reduce

Posted by Ted Reynolds <te...@hortonworks.com>.

Jason,

The line in the JobTracker log - "Could only be replicated to 0 nodes,
instead of 1" points to a problem with your data node.  I generally means
that your DataNode is either down or not functioning correctly.  What is
the output of the "jps" command?  ("jps" is found in <JAVA_HOME>/bin).

Ted Reynolds
Technical Support Engineer
Hortonworks
Work Phone: 408-645-7079

<http://hortonworks.com/download/>



On Mon, Sep 24, 2012 at 6:25 AM, Jason Yang <li...@gmail.com>wrote:

> Hi, all
>
> I have been stuck by a weird problem for a long time. so I was wondering
> could anyone give me some advise?
>
> I have a MapReduce Job , in which:
>
> 1. the mapper would read a whole file as a split (by overriding
> FileInputFormat.isSplitable ) and translate it into a set of points, and
> then cluster these points into 4 groups randomly. The output of mapper is
> <splitID_GroupID, point>. so If I have 20 input files, then there will be
> 20*4=80 unique keys in total.
>
> 2. I have implemented a KeyasNameMSOutputFormat which return file name
> according to the key:
>
>> public class KeyasNameMSFOutputFormat
>> extends MultipleSequenceFileOutputFormat<Text, point>
>> {
>> protected String generateFileNameForKeyValue(Text key,
>>  Point value, String name)
>> {
>> return key.toString();
>>  }
>> }
>
>
>  3. Use the IdentityReducer. and the jobConf is like:
>
>> // set path for input and output
>> ...
>>
>> // set format for input and output
>> jobConf.setInputFormat(WholeFileInputFormat.class);
>> jobConf.setOutputFormat(KeyasNameMSFOutputFormat.class);
>>
>> // set class of output key and value
>> jobConf.setOutputKeyClass(Text.class);
>> jobConf.setOutputValueClass(Point.class);
>>
>> // set mapper and reducer
>> jobConf.setMapperClass(LocalClusteringMapper.class);
>> jobConf.setReducerClass(IdentityReducer.class);
>>
>> // run the job
>> JobClient.runJob(jobConf);
>
>
> On the pseudo-distributed hadoop cluster, this job runs fine with small
> number of input files(e.g. 5 input files), but it would always failed in
> the reduce phase if there are more than 20 input files.
>
> I have checked the log, and find some strange logs:
>
> *----JobTracker Log----*
> 2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201209241945_0001_m_000019_0' has completed
> task_201209241945_0001_m_000019 successfully.
> 2012-09-24 19:48:59,490 INFO org.apache.hadoop.mapred.ResourceEstimator:
> completedMapsUpdates:20  completedMapsInputSize:28649108
>  completedMapsOutputSize:41385220
> 2012-09-24 19:49:14,668 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201209241945_0001_r_000000_0:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
>  at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>  at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:740)
>  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
> at $Proxy1.addBlock(Unknown Source)
>  at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>  at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy1.addBlock(Unknown Source)
>  at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
>  at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
>
>
> *----DataNode log----*
> 2012-09-24 19:49:08,156 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8681085577211416125_16056 src: /127.0.0.1:39500 dest: /
> 127.0.0.1:50010
> 2012-09-24 19:49:08,223 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_5641670145666195897_16057 src: /127.0.0.1:39501 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,324 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_2433781882701878319_16058 src: /127.0.0.1:39504 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,389 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_7063241539808048732_16059 src: /127.0.0.1:39505 dest: /127.0.0.1:50010
> 2012-09-24 19:49:08,465 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-8805810864714094811_16060 src: /127.0.0.1:39506 dest: /
> 127.0.0.1:50010
> 2012-09-24 19:49:09,850 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39463, dest: /127.0.0.1:50010, bytes: 513321, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_5143138923881025306_16019
> 2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> PacketResponder 0 for block blk_5143138923881025306_16019 terminating
> 2012-09-24 19:49:09,941 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39464, dest: /127.0.0.1:50010, bytes: 483928, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_1512576479762656406_16020
> 2012-09-24 19:49:09,944 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for
> block blk_1512576479762656406_16020 terminating
> 2012-09-24 19:49:09,975 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> 127.0.0.1:39465, dest: /127.0.0.1:50010, bytes: 496747, op: HDFS_WRITE,
> cliID: DFSClient_attempt_201209241945_0001_r_000000_0, srvID:
> DS-373612585-127.0.1.1-50010-1347885793411, blockid:
> blk_8640569083674549899_16021
>
> *----NameNode Log----*
> 2012-09-24 19:49:09,836 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=yanglin,yanglin,adm,dialout,cdrom,plugdev,lpadmin,admin,sambashare
> ip=/127.0.0.1 cmd=create
> src=/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> dst=null perm=yanglin:supergroup:rw-r--r--
> 2012-09-24 19:49:09,845 WARNorg.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place
> enough replicas, still in need of 1
> 2012-09-24 19:49:09,845 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 8020, call
> addBlock(/work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2,
> DFSClient_attempt_201209241945_0001_r_000000_0) from 127.0.0.1:37972:error: java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> java.io.IOException: File
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-002033-2
> could only be replicated to 0 nodes, instead of 1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
>  at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
> at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>  at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> 2012-09-24 19:49:09,851 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to
> blk_5143138923881025306_16019 size 513321
> 2012-09-24 19:49:09,852 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.completeFile: file
> /work/lab/ecg/tmpoutput/lc/RR/_temporary/_attempt_201209241945_0001_r_000000_0/RR-LC-000000-0
> is closed by DFSClient_attempt_201209241945_0001_r_000000_0
>
>
> *----taskTracker log----*
> 2012-09-24 19:49:00,813 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:02,451 INFO org.apache.hadoop.mapred.TaskTracker: Sent
> out 1873109 bytes for reduce: 0 from map:
> attempt_201209241945_0001_m_000018_0 given 1873109/1873105
> 2012-09-24 19:49:02,452 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
> dest: 127.0.0.1:48717, bytes: 1873109, op: MAPRED_SHUFFLE, cliID:
> attempt_201209241945_0001_m_000018_0
> 2012-09-24 19:49:02,468 INFO org.apache.hadoop.mapred.TaskTracker: Sent
> out 1328490 bytes for reduce: 0 from map:
> attempt_201209241945_0001_m_000019_0 given 1328490/1328486
> 2012-09-24 19:49:02,469 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:50060,
> dest: 127.0.0.1:48717, bytes: 1328490, op: MAPRED_SHUFFLE, cliID:
> attempt_201209241945_0001_m_000019_0
> 2012-09-24 19:49:03,054 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:03,641 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.3% reduce > copy (18 of 20 at 0.54
> MB/s) >
> 2012-09-24 19:49:06,828 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.7520079% reduce > reduce
> 2012-09-24 19:49:09,830 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
> 2012-09-24 19:49:12,445 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201209241945_0001_r_000000_0 0.85404265% reduce > reduce
> 2012-09-24 19:49:12,538 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201209241945_0001_r_1599614869 exited. Number of tasks it ran: 0
> 2012-09-24 19:49:15,538 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_201209241945_0001_r_000000_0 done; removing files.
> 2012-09-24 19:49:15,558 INFO org.apache.hadoop.mapred.TaskTracker:
> addFreeSlot : current free slots : 2
> 2012-09-24 19:49:17,677 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_201209241945_0001_r_000000_1
> task's state:UNASSIGNED
> 2012-09-24 19:49:17,678 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> to launch : attempt_201209241945_0001_r_000000_1
>
> In case it concern, here is my developing environment:
> Hadoop Version: 0.20.2
> OS: Ubuntu 12.04 LTS
>
> *yanglin@ubuntu:~$ ulimit -a*
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 32059
> max locked memory       (kbytes, -l) 64
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 10240
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 20480
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 32059
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
>
> --
> YANG, Lin
>
>